In a situation that feels similar to #22112 (closed), the output of the Prep stage loops when processing a recursive definition. The big difference is that I haven't been able to reproduce this using GHC-as-a-program, because it requires saving to and then loading back from an interface file-ish representation.
The attached program compiles with GHC as of 161a6f1f and when run, typechecks and compiles up to Tidy a small program that contains a recursive definition. The Tidy binding is saved into a file alongside the ModIface. Afterwards, in a fresh GHC session, the ModIface and the binding is loaded, and then the binding is processed by Prep. When we try to print the length of the binds returned by Prep, we loop.
Similar to #22112 (closed), this only happens when unfoldings are preserved, and when the definition in the input file is recursive. Opt_NoTypeBinds is only turned on to make it easier to process the resulting binds; unlike all other settings in setup, that one is not essential to reproduce the problem.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Related merge requests
1
When this merge request is accepted, this issue will be closed automatically.
Actually turns out Prep is a red herring -- the realIdUnfolding of the round-tripped Id already loops. See this new, simplified version that still demonstrates the problem: main.hs
Gergő Érdichanged title from Prep generates looping result when restoring from interface to {+Tidy's output contains realIdUnfolding that loops when round-tripping via+} interfaces
changed title from Prep generates looping result when restoring from interface to {+Tidy's output contains realIdUnfolding that loops when round-tripping via+} interfaces
Experimentation shows this is a regression from somewhere between 9.2 and 9.4 GHC 9.2.4 has no trouble compiling B.hs with -O, but GHC 9.4.2 loops just like master.
d7758da490db3cc662dbebdac4397b4b2c38d0f0 is the first bad commitcommit d7758da490db3cc662dbebdac4397b4b2c38d0f0Author: Sebastian Graf <sebastian.graf@kit.edu>Date: Tue Jun 22 14:02:49 2021 +0200 Simplifier: Do Cast W/W for INLINE strong loop-breakers Strong loop-breakers never inline, INLINE pragma or not. Hence they should be treated as if there was no INLINE pragma on them. Also not doing Cast W/W for INLINE strong loop-breakers will trip up Strictness W/W, because it treats them as if there was no INLINE pragma. Subsequently, that will lead to a panic once Strictness W/W will no longer do eta-expansion, as we discovered while implementing !5814. I also renamed to `unfoldingInfo` to `realUnfoldingInfo` and redefined `unfoldingInfo` to zap the unfolding it returns in case of a strong loop-breaker. Now the naming and semantics is symmetrical to `idUnfolding`/`realIdUnfolding`. Now there was no more reason for `hasInlineUnfolding` to operate on `Id`, because the zapping of strong loop-breaker unfoldings moved from `idUnfolding` to `unfoldingInfo`, so I refactored it to take `IdInfo` and call it both from the Simplifier and WorkWrap, making it utterly clear that both checks are equivalent.
OK I see it. Thank you Gergo for a great repro case.
Diagnosis
Here is the problem. We have, in an interface file
foo :: (); Unfolding = foo
When rehydrating the interface file we are going to make an Id for foo (in GHC.IfaceToCore).
That Id has an Unfolding. Background: here is the defn of CoreUnfolding:
| CoreUnfolding { uf_tmpl :: CoreExpr, -- Template; occurrence info is correct uf_src :: UnfoldingSource, -- Where the unfolding came from uf_is_top :: Bool, -- True <=> top level binding uf_is_value :: Bool, -- exprIsHNF template (cached); it is ok to discard -- a `seq` on this variable uf_is_conlike :: Bool, -- True <=> applicn of constructor or CONLIKE function -- Cached version of exprIsConLike uf_is_work_free :: Bool, -- True <=> doesn't waste (much) work to expand -- inside an inlining -- Cached version of exprIsCheap uf_expandable :: Bool, -- True <=> can expand in RULE matching -- Cached version of exprIsExpandable uf_guidance :: UnfoldingGuidance -- Tells about the *size* of the template. }
The uf_tmpl (the unfolding template) of that unfolding mentions foo.
We deal with this by knot-tying: we populate the environment with a binding of "foo" to the final Id for foo; when we encounter the occurrence of foo in the unfolding, we just look it up in the envt.
But an Unfolding has a uf_is_value field, which is built by exprIsValue uf_tmpl.
exprIsValue e looks at the unfoldings of variables in e to see if they are evaluated; so it consults the uf_is_value field of variables in e.
Now we can see the problem: to set the uf_is_value field of foo's unfolding, we look at its unfolding (in this case just foo itself!). Loop.
Why doesn't this happen in the Simplifier? Because in a recursive binding
letrec { r = ...r... }
we simplify r's RHS with "r" bound to an Id that doesn't have an unfolding yet.
Only "after" the letrec do we add r's unfolding. No knot-tying here.
The lack of a truly circular structure doesn't matter. We don't want to inline r in its
own RHS anyway. If any more occurrences of r (not it its own RHS) pop up, we'll end up
replacing them with r-with-unfolding as we repeatedly traverse the code.
But imported things are immutable. So if the recursive uses lack an unfolding, they'll lack
it forever. It's appealing to build a truly circular structure.
Cure
What to do? One simple possiblity would be to serialise the various auxilary fields of CoreUnfolding so that we don't need to recreate them when rehydrating. Specifically:
uf_is_conlike
uf_is_value
uf_is_work_free
uf_expandable
That would make interface files a tiny bit bigger, but not much: they are all booleans, so it's only four bits. (I don't know how to actually make it that compact.. needs someone who understands Binary.) And this would save calls to exprIsValue, exprIsExpandable etc for every imported Id.
We could choose to do this only for loop breakers. But thats a bit more complicated and it seems good all round.
Loooking at it, there is a similar opportuinty with IfGuidance. Currently we serialise UnfWhen guidance, but not others, and we could serialise all. (Orthogonal suggestion.)
Returning to the knot, I could imagine other solutions... e.g. knot-tying with less-info-rich Ids, as we do in the simplifier. But that is quite significantly more complicated and doesn't build a truly circular structure.
My vote goes to serialising the flags. With a very careful Note. (I have taken the trouble to write all this out here, partly as a basis for such a Note.)
I'm happy to work on this tomorrow (implementing the refactoring you described to store more and compute less in interfaces) if no one else steps up by then.
So this and #22112 (closed) are indeed similar in flavour. @simonpj do you have any teachable techniques for finding the root cause of this kind of knot-tying-gone-wrong situations so quickly? Or do you just have enough of GHC swapped in to make it possible?
If you're interested Gergo, I looked at the loop using ghc-debug-brick and could see the issue was knot-tying on one of the predicate fields but Simon beat me to the full analysis.