compare-ticks could look at the amount of allocation and order in the ticky file to find better pairings
I strangely find that the .ticky files already are quite good for diffing, although it's a bit tedious to find the hard hitters. Why is that?
- The order in which the bindings appear is deterministic across different compiler versions, at least for the examples I checked
- That means the order in which the bindings appear line up pretty well across .ticky files. They get out of sync as soon as one variant introduces a binding that wasn't there in the other variant, but if you add an empty line in the other variant accordingly, it lines up well again.
I think those observations suggest some kind of edit distance approach for lining them up to find pairings for the presented 3 tables. Characters in the edit distance jargon correspond to lines/metrics in the ticky file. Equal characters/line have cost zero, a substitution has cost based on the similarity between lines, e.g., same (significant, e.g. > 10000 bytes) no of allocations but different name has a minor substitution cost 0.05. Then an insert/delete could have cost 1.
Just to be clear: whenever the algorithm chooses "substitute" or "equal", you'd put them in the first comparison table, "insert" would put them in the "alloc A" table and "delete" would put them in the "alloc B" table.
Since it's a dynamic program, we can be sure to always find "the optimum" effecienctly, without any inspection of STG code like I suggested in #13, which is pretty nice. "The optimum" surely depends on the similarity function we choose, but I strongly believe that we can fine-tune it to great effect.