I apologize for not debugging this sooner. I finally have a hypothesis for the regression:
Prior to 5341edf3, shifts that were nonsensically large would be optimized into errors at the core level.
After 5341edf3, these absurd shifts can make it all the way down to MO_Shl, MO_U_Shr, and MO_S_Shr.
The CMM constant folding in CmmOpt.hs kicks in and GHC tries to perform the shift, leading to a heap overflow.
Guarding the constant folding in step 3 solves the heap overflow, but GHC ends up generating potentially invalid assembly: shlq $9223372036854775807,%rbx. Since this code should anyways be unreachable, perhaps we could just replace invalid CMM shifts with a nop instruction?
@harpocrates precisely which shift are we talking about?
If the code is truly unreachable then the C-- pipeline should realize this and drop the block. However, the constant folding logic shouldn't blow up on large values regardless. Perhaps the C-- constant folding logic should be guarded on the size of the shift?
@bgamari Here is a relevant snippet from _quick/stage1/bin/ghc -fforce-recomp -O Bug.hs -ddump-simpl. Note the (GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#).
That gets us past the compiler heap overflow, but it doesn't fix the invalid assembly that then gets generated.
Yes, that is quite true. Strictly speaking the behavior of uncheckIShiftL# undefined in the case of invalid shift. We are within our rights to lower this as a no-op. However, I think it would be better to lower this as a proper abort. Otherwise an incorrect compiler optimisation may turn into a quite a puzzle to solve.
The CMM constant folding in CmmOpt.hs kicks in and GHC tries to perform the
shift, leading to a heap overflow.
I don't understand how can CmmOpt overflow the heap to optimise a shift.
Optimisation of a shift is simply doing the shift in compile time, no? How does
CmmOpt allocate so much just to do a single shift? In other words, I don't understand how
can optimisation of this single line:
_s3bl::I64 = _s3bl::I64 | (1 << _s3bp::I64);
require so much allocation that overflows the heap.
Can you show us which line in CmmOpt does this optimisation which overflows the
heap?
Prior to 5341edf3, shifts that were nonsensically large would be optimized into errors at the core level.
How do you check this? ...
I didn't, it is a hypothesis.
Looking at the Core generated by GHC 8.6.4:
...
I don't see any errors here.
I'm not sure why 8.8 is even producing GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#. That does seem suspicious too. Nonetheless, I shouldn't be able to make GHC crash by manually calling uncheckedIShiftL# with invalid arguments (although the generated code might be bogus).
I don't understand how can CmmOpt overflow the heap to optimise a shift. Optimisation of a shift is simply doing the shift in compile time, no? How does CmmOpt allocate so much just to do a single shift? In other words, I don't understand how can optimisation of this single line:
_s3bl::I64 = _s3bl::I64 | (1 << _s3bp::I64);
require so much allocation that overflows the heap.
The constant folding is performed over Integer. Given GMP's representation of Integer, it's possible that asking to shift by some ludicrous amount would exhaust available memory.
I think the Note [Guarding against silly shifts] in compiler/PrelRules is relevant.
I guess we should bring back the substitution in Core that was removed by 5341edf3
and probably do the same thing at the Cmm level for the same reasons.
I think the Note [Guarding against silly shifts] in compiler/PrelRules is relevant. I guess we should bring back the substitution in Core that was removed by 5341edf3 and probably do the same thing at the Cmm level for the same reasons.
I don't agree with this; #16111 (closed) is a bug, not a feature. Unfortunately, I don't have much time today to argue about this one way or another. If there is a consensus on a way forward, I'll implement.
The constant folding is performed over Integer. Given GMP's representation of
Integer, it's possible that asking to shift by some ludicrous amount would
exhaust available memory.
Right, this makes sense, thanks.
Here's the story so far:
#16111 (closed) reported a bug where different optimisations and backend parameters
caused different results in a tiny program with undefined behavior (shift by
a negative amount).
Because the behavior is undefined, strictly speaking this is not a bug.
5341edf3 improved things by replacing the
undefined behavior with a runtime error. Undocumented in the commit message,
it also removed a rewrite rule that replaced negative shifts with errors, so
the simplifier no longer introduces errors when shifting negative amounts, all
errors are caught by the primops. (the rewrite rule part isn't relevant
because the rule does not apply to this program)
This program:
module Bug whereimport Data.Bits (setBit)f :: Intf = foldl setter 0 $ zip [0..] [()] where setter v (ix, _) = setBit v ix
As a result, with 5341edf3 we end up with this
expression: GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807# which
CmmOpt tries to evaluate, using Integer as the value, causing the
overflow.
It'd be useful to know why they're simplified differently. Either way the
modified rule in 5341edf3 is not used so I'd
expect them to get simplified the same way.
But regardless, I think the bug is CmmOpt computes huge numbers and we need to
avoid that. 5341edf3 does not introduce this
bug, it just reveals it. So
I guess we should bring back the substitution in Core
no need for this as the rule has nothing to do with this bug.
If anyone has the time, it'd be good to know why with
5341edf3 the >=# rule shown above does not
fire.
Ah, I see why with 5341edf3 this program is simplified differently: it makes shiftL etc. methods larger and introduces case expressions. That causes different simplifications.
Thinking about this more; I think it's a problem that CmmOpt doesn't do any
bounds checking, but there's also another bug. Normally we shouldn't introduce a
shift for larger than the "word size in bits" for the architecture. Here's the
relevant bits in primops.txt.pp:
primop ISllOp "uncheckedIShiftL#" GenPrimOp Int# -> Int# -> Int# {Shift left. Result undefined if shift amount is not in the range 0 to word size - 1 inclusive.}
and indeed the iShiftL# function makes sure the argument is in range:
-- | Shift the argument left by the specified number of bits-- (which must be non-negative).iShiftL# :: Int# -> Int# -> Int#a `iShiftL#` b | isTrue# (b >=# WORD_SIZE_IN_BITS#) = 0# | otherwise = a `uncheckedIShiftL#` b
In the reproducer, at some point we see this expression:
case GHC.Prim.>=# x_a3hX 64# of { __DEFAULT -> GHC.Types.I# (GHC.Prim.orI# x#_a3h9 (GHC.Prim.uncheckedIShiftL# 1# x_a3hX)); 1# -> GHC.Types.I# x#_a3h9}
Here the argument is smaller than 64 so this use is fine, but simplifier takes
more steps:
case GHC.Prim.>=# w_s3kc 64# of { __DEFAULT -> let { ww_s3ka :: GHC.Prim.Int# ww_s3ka = GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# w_s3kc) } in jump $j_s3kr ww_s3ka; 1# -> jump $j_s3kr ww_s3kh}
then
case GHC.Prim.>=# w_s3kc 64# of { __DEFAULT -> case w_s3kc of wild_Xn { __DEFAULT -> $wgo_s3kj (GHC.Prim.+# wild_Xn 1#) ys_a3jc (GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# w_s3kc)); 9223372036854775807# -> GHC.Types.I# (GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#)) }; 1# -> jump $j_s3kr ww_s3kh}
and introduces an incorrect use of the primop.
At this point the branch with incorrect uncheckedIShiftL# can't be taken, but
apparently the simplifier is not smart enough to drop the branch. I guess this
is what @harpocrates meant in
#16449 (comment 191376):
Since this code should anyways be unreachable, perhaps we could just replace
invalid CMM shifts with a nop instruction?
I think the bug here is simplifier introducing incorrect use of the primop. But
perhaps we're OK with introducing this, as long as we later eliminate the code?
Not sure how to best approach this problem.