Improve trimming of DmdAnal results using findTypeShape.
Summary
Currently findTypeShape will give a infinite result for a type like newtype T a = T (a,T a)
after the changes in
!3466 (closed)
This is problematic, not because the returned shape is infinite, but because it means dmdAnal won't find a fixpoint as it uses findTypeShape to widen demands.
This has two potential outcomes:
- The user aborts compilation as it takes "forever". See #18304 (closed)
- The compiler aborts demand analysis and as result the analyzed function will only gets a default demand.
The potential loop is because in the current implementation we:
- Keep track of seen type constructors.
- Bail out if we see a data type TyCon multiple times while traversing the types structure.
- Look through newtypes.
- But skip the occurrence check for tuples, causing the loop. here are slightly more details on that.
The current approach was implemented to fix #18304 (closed). But it regresses the analysis for nested non-recursive product types of the same tycon.
For example in this code where ST is a strict tuple: (ST (ST (ST 1 2) 3) 4)
the innermost occurrence is considered recursive which causes us to throw away results for the innermost tuple because we stop after two occurances of ST.
!3516 (closed) tries to solve this by keeping track of seen types instead of tyCons. SPJ points out this doesn't work because data T a = MkT Int (T [a]))
will give infinitely many different types and so will loop.
What options does this give us
We definitely want to widen in some fashion, or the recursive case might blow up to a degree where GHC won't finish compilation in a reasonable amount of time.
Given this we can:
- Widen using tyCons only (the current behaviour on master), possibly changing the cutoff.
- Widen using types and tyCons, bail out if the same tyCon is seen "often". Bail out immediately if the type repeats.
- Widen using types only.
- Any of the above but using fuel to bail out if a demand becomes nested too deeply.
There are a few other considerations (should we expand the tuple treatment to type variable fields for example) but they don't change the core issue.
Tracking types is nice as it detects more recursive cases properly on their first occurence. But some cases like data T a = MkT Int (T [a]))
it will never detect so it has to be combined with one of the other approaches to work.
Tracking tyCons is nice as it will always terminate (assuming we take care of the newtype issue) at some point. But it will claim things are recursive when they are not - for nested but non-recursive tycons.
The end result is that tracking types will give no meaningful results for very few recursive cases. TyCons will give good results for actually recursive things. But will regress the analysis for deeply nested product types like tuples.
Tracking the types seems tempting. But in the end we would still want some kind of tyCon and/or nesting threshold to deal with polymorphic recursion. So we might as well solely use a threshold based approach. Which is far better for compiler performance as well.
Currently I think the best way is to:
- Stick with the tyCon approach, because it has the best compiler performance.
- Increase the threshold for how often a tyCon is allowed to appear. I suggest using 5 occurrences as cutoff.
- Add logic to properly deal with recursion through newtypes.
- Generalize the tuple treatment to all type variable defined fields in products. This is to avoid making tuple performance magical.