GHC 8.8 heap overflow regression

changed milestone to %8.10.1

changed weight to 10

This regression was introduced in commit 5341edf3 (Error out of invalid Int/Word bit shifts ).

cc @harpocrates

changed title from GHC HEAD heap overflow regression to GHC 8.8 heap overflow regression

changed the description

changed milestone to %8.8.1

added Tbug label

assigned to @harpocrates

-ddump-simpl -ddump-stg -ddump-cmm suggests that the bug is in the code generator somewhere.

I apologize for not debugging this sooner. I finally have a hypothesis for the regression:

Prior to 5341edf3, shifts that were nonsensically large would be optimized into errors at the core level.
After 5341edf3, these absurd shifts can make it all the way down to MO_Shl, MO_U_Shr, and MO_S_Shr.
The CMM constant folding in CmmOpt.hs kicks in and GHC tries to perform the shift, leading to a heap overflow.

Guarding the constant folding in step 3 solves the heap overflow, but GHC ends up generating potentially invalid assembly: shlq $9223372036854775807,%rbx. Since this code should anyways be unreachable, perhaps we could just replace invalid CMM shifts with a nop instruction?

Thoughts?

@harpocrates precisely which shift are we talking about?

If the code is truly unreachable then the C-- pipeline should realize this and drop the block. However, the constant folding logic shouldn't blow up on large values regardless. Perhaps the C-- constant folding logic should be guarded on the size of the shift?

@bgamari Here is a relevant snippet from _quick/stage1/bin/ghc -fforce-recomp -O Bug.hs -ddump-simpl. Note the (GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#).

Rec {
-- RHS size: {terms: 55, types: 16, coercions: 0, joins: 0/0}
Bug.$wgo [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
  :: GHC.Prim.Int# -> [()] -> GHC.Prim.Int# -> Int
[GblId, Arity=3, Str=<L,U><S,1*U><L,U>m, Unf=OtherCon []]
Bug.$wgo
  = \ (w_s3kc :: GHC.Prim.Int#)
      (w1_s3kd :: [()])
      (ww_s3kh :: GHC.Prim.Int#) ->
      case w1_s3kd of {
        [] -> GHC.Types.I# ww_s3kh;
        : y_a3jb ys_a3jc ->
          case GHC.Prim.>=# w_s3kc 0# of {
            __DEFAULT -> case GHC.Real.overflowError of wild1_00 { };
            1# ->
              case GHC.Prim.>=# w_s3kc 64# of {
                __DEFAULT ->
                  case w_s3kc of wild1_Xn {
                    __DEFAULT ->
                      Bug.$wgo
                        (GHC.Prim.+# wild1_Xn 1#)
                        ys_a3jc
                        (GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# wild1_Xn));
                    9223372036854775807# ->
                      GHC.Types.I#
                        (GHC.Prim.orI#
                           ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#))
                  };
                1# ->
                  case w_s3kc of wild1_Xn {
                    __DEFAULT -> Bug.$wgo (GHC.Prim.+# wild1_Xn 1#) ys_a3jc ww_s3kh;
                    9223372036854775807# -> GHC.Types.I# ww_s3kh
                  }
              }
          }
      }
end Rec }

Perhaps the C-- constant folding logic should be guarded on the size of the shift?

That gets us past the compiler heap overflow, but it doesn't fix the invalid assembly that then gets generated.

/var/folders/n5/0p2l4ydj6b34mxcd3bfz7djc0000gp/T/ghc82774_0/ghc_2.s:133:7: error:
     error: invalid operand for instruction
            shlq $9223372036854775807,%rbx
                 ^~~~~~~~~~~~~~~~~~~~
    |
133 |         shlq $9223372036854775807,%rbx
    |       ^
`gcc' failed in phase `Assembler'. (Exit code: 1)

That gets us past the compiler heap overflow, but it doesn't fix the invalid assembly that then gets generated.

Yes, that is quite true. Strictly speaking the behavior of uncheckIShiftL# undefined in the case of invalid shift. We are within our rights to lower this as a no-op. However, I think it would be better to lower this as a proper abort. Otherwise an incorrect compiler optimisation may turn into a quite a puzzle to solve.

I'm a bit confused about this. @harpocrates,

Prior to 5341edf3, shifts that were nonsensically large would be optimized into errors at the core level.

How do you check this? Looking at the Core generated by GHC 8.6.4:

Rec {
-- RHS size: {terms: 35, types: 11, coercions: 0, joins: 0/0}
Bug.$wgo [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
  :: GHC.Prim.Int# -> [()] -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<L,1*U><S,1*U><S,U>,
 Unf=OtherCon []]
Bug.$wgo
  = \ (w_s3az :: GHC.Prim.Int#)
      (w1_s3aA :: [()])
      (ww_s3aE :: GHC.Prim.Int#) ->
      case w1_s3aA of {
        [] -> ww_s3aE;
        : y_a39v ys_a39w ->
          case w_s3az of wild1_Xn {
            __DEFAULT ->
              case GHC.Prim.>=# wild1_Xn 64# of {
                __DEFAULT ->
                  Bug.$wgo
                    (GHC.Prim.+# wild1_Xn 1#)
                    ys_a39w
                    (GHC.Prim.orI# ww_s3aE (GHC.Prim.uncheckedIShiftL# 1# wild1_Xn));
                1# -> Bug.$wgo (GHC.Prim.+# wild1_Xn 1#) ys_a39w ww_s3aE
              };
            9223372036854775807# -> ww_s3aE
          }
      }
end Rec }

I don't see any errors here.

Secondly,

The CMM constant folding in CmmOpt.hs kicks in and GHC tries to perform the shift, leading to a heap overflow.

I don't understand how can CmmOpt overflow the heap to optimise a shift. Optimisation of a shift is simply doing the shift in compile time, no? How does CmmOpt allocate so much just to do a single shift? In other words, I don't understand how can optimisation of this single line:

_s3bl::I64 = _s3bl::I64 | (1 << _s3bp::I64);

require so much allocation that overflows the heap.

Can you show us which line in CmmOpt does this optimisation which overflows the heap?

@osa1

Prior to 5341edf3, shifts that were nonsensically large would be optimized into errors at the core level.

How do you check this? ...

I didn't, it is a hypothesis.

Looking at the Core generated by GHC 8.6.4: ... I don't see any errors here.

I'm not sure why 8.8 is even producing GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#. That does seem suspicious too. Nonetheless, I shouldn't be able to make GHC crash by manually calling uncheckedIShiftL# with invalid arguments (although the generated code might be bogus).

I don't understand how can CmmOpt overflow the heap to optimise a shift. Optimisation of a shift is simply doing the shift in compile time, no? How does CmmOpt allocate so much just to do a single shift? In other words, I don't understand how can optimisation of this single line:
_s3bl::I64 = _s3bl::I64 | (1 << _s3bp::I64);
require so much allocation that overflows the heap.

The offending lines are here: https://gitlab.haskell.org/ghc/ghc/blob/51fd357119b357c52e990ccce9059c423cc49406/compiler/cmm/CmmOpt.hs#L148-150. As mentioned before, guarding those cases with 0 < y, y < wordsizeInBits prevents the heap overflow.

The constant folding is performed over Integer. Given GMP's representation of Integer, it's possible that asking to shift by some ludicrous amount would exhaust available memory.

I think the Note [Guarding against silly shifts] in compiler/PrelRules is relevant. I guess we should bring back the substitution in Core that was removed by 5341edf3 and probably do the same thing at the Cmm level for the same reasons.

I think the Note [Guarding against silly shifts] in compiler/PrelRules is relevant. I guess we should bring back the substitution in Core that was removed by 5341edf3 and probably do the same thing at the Cmm level for the same reasons.

I don't agree with this; #16111 (closed) is a bug, not a feature. Unfortunately, I don't have much time today to argue about this one way or another. If there is a consensus on a way forward, I'll implement.

The offending lines are here: https://gitlab.haskell.org/ghc/ghc/blob/51fd357119b357c52e990ccce9059c423cc49406/compiler/cmm/CmmOpt.hs#L148-150. As mentioned before, guarding those cases with 0 < y, y < wordsizeInBits prevents the heap overflow.

The constant folding is performed over Integer. Given GMP's representation of Integer, it's possible that asking to shift by some ludicrous amount would exhaust available memory.

Right, this makes sense, thanks.

Here's the story so far:

#16111 (closed) reported a bug where different optimisations and backend parameters caused different results in a tiny program with undefined behavior (shift by a negative amount). Because the behavior is undefined, strictly speaking this is not a bug.
5341edf3 improved things by replacing the undefined behavior with a runtime error. Undocumented in the commit message, it also removed a rewrite rule that replaced negative shifts with errors, so the simplifier no longer introduces errors when shifting negative amounts, all errors are caught by the primops. (the rewrite rule part isn't relevant because the rule does not apply to this program)

This program:

module Bug where

import Data.Bits (setBit)

f :: Int
f = foldl setter 0 $ zip [0..] [()]
  where
    setter v (ix, _) = setBit v ix

Is simplified to this without 5341edf3

Rec {
-- RHS size: {terms: 41, types: 19, coercions: 0, joins: 0/2}
go_a3cF [Occ=LoopBreaker] :: GHC.Prim.Int# -> [()] -> Int -> Int
[LclId,
 Arity=3,
 Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 50 20] 194 0}]
go_a3cF
  = \ (x_a3cG :: GHC.Prim.Int#) (eta_B2 :: [()]) (eta_B1 :: Int) ->
      let {
        _x_a3dN :: Int
        [LclId]
        _x_a3dN = GHC.Types.I# x_a3cG } in
      let {
        _r_a3dO [OS=OneShot] :: [()] -> Int -> Int
        [LclId]
        _r_a3dO
          = case x_a3cG of {
              __DEFAULT -> go_a3cF (GHC.Prim.+# x_a3cG 1#);
              9223372036854775807# -> n_a3cB
            } } in
      case eta_B2 of {
        [] -> id @ Int eta_B1;
        : y_a3dU ys_a3dV ->
          _r_a3dO
            ys_a3dV
            (case eta_B1 of { GHC.Types.I# x#_a3bS ->
             case GHC.Prim.>=# x_a3cG 64# of {
               __DEFAULT ->
                 GHC.Types.I#
                   (GHC.Prim.orI# x#_a3bS (GHC.Prim.uncheckedIShiftL# 1# x_a3cG));
               1# -> GHC.Types.I# x#_a3bS
             }
             })
      }
end Rec }

With 5341edf3

Rec {
-- RHS size: {terms: 49, types: 23, coercions: 0, joins: 0/2}
go_a3np [Occ=LoopBreaker] :: GHC.Prim.Int# -> [()] -> Int -> Int
[LclId,
 Arity=3,
 Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 50 20] 215 0}]
go_a3np
  = \ (x_a3nq :: GHC.Prim.Int#) (eta_B2 :: [()]) (eta_B1 :: Int) ->
      let {
        _x_a3ox :: Int
        [LclId]
        _x_a3ox = GHC.Types.I# x_a3nq } in
      let {
        _r_a3oy [OS=OneShot] :: [()] -> Int -> Int
        [LclId]
        _r_a3oy
          = case x_a3nq of {
              __DEFAULT -> go_a3np (GHC.Prim.+# x_a3nq 1#);
              9223372036854775807# -> n_a3nl
            } } in
      case eta_B2 of {
        [] -> id @ Int eta_B1;
        : y_a3oE ys_a3oF ->
          _r_a3oy
            ys_a3oF
            (case eta_B1 of { GHC.Types.I# x#_a3mB ->
             case GHC.Prim.>=# x_a3nq 0# of {
               __DEFAULT -> case GHC.Real.overflowError of wild_00 { };
               1# ->
                 case GHC.Prim.>=# x_a3nq 64# of {
                   __DEFAULT ->
                     GHC.Types.I#
                       (GHC.Prim.orI# x#_a3mB (GHC.Prim.uncheckedIShiftL# 1# x_a3nq));
                   1# -> GHC.Types.I# x#_a3mB
                 }
             }
             })
      }
end Rec }

I don't understand why they're simplified differently. Looking at rule rewritings, the only difference is, without 5341edf3 one more rule fires:

Rule fired
    Rule: >=#
    Module: (BUILTIN)
    Before: GHC.Prim.>=# ValArg 9223372036854775807# ValArg 64#
    After:  1#
    Cont:   Select nodup lwild_a3bY
            Stop[BoringCtxt] GHC.Types.Int

As a result, with 5341edf3 we end up with this expression: GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807# which CmmOpt tries to evaluate, using Integer as the value, causing the overflow.

It'd be useful to know why they're simplified differently. Either way the modified rule in 5341edf3 is not used so I'd expect them to get simplified the same way.

But regardless, I think the bug is CmmOpt computes huge numbers and we need to avoid that. 5341edf3 does not introduce this bug, it just reveals it. So

I guess we should bring back the substitution in Core

no need for this as the rule has nothing to do with this bug.

If anyone has the time, it'd be good to know why with 5341edf3 the >=# rule shown above does not fire.

Ah, I see why with 5341edf3 this program is simplified differently: it makes shiftL etc. methods larger and introduces case expressions. That causes different simplifications.

Thinking about this more; I think it's a problem that CmmOpt doesn't do any bounds checking, but there's also another bug. Normally we shouldn't introduce a shift for larger than the "word size in bits" for the architecture. Here's the relevant bits in primops.txt.pp:

primop   ISllOp   "uncheckedIShiftL#" GenPrimOp  Int# -> Int# -> Int#
         {Shift left.  Result undefined if shift amount is not
          in the range 0 to word size - 1 inclusive.}

and indeed the iShiftL# function makes sure the argument is in range:

-- | Shift the argument left by the specified number of bits
-- (which must be non-negative).
iShiftL# :: Int# -> Int# -> Int#
a `iShiftL#` b  | isTrue# (b >=# WORD_SIZE_IN_BITS#) = 0#
                | otherwise                          = a `uncheckedIShiftL#` b

In the reproducer, at some point we see this expression:

case GHC.Prim.>=# x_a3hX 64# of {
  __DEFAULT -> GHC.Types.I# (GHC.Prim.orI# x#_a3h9 (GHC.Prim.uncheckedIShiftL# 1# x_a3hX));
  1# -> GHC.Types.I# x#_a3h9
}

Here the argument is smaller than 64 so this use is fine, but simplifier takes more steps:

case GHC.Prim.>=# w_s3kc 64# of {
  __DEFAULT ->
    let {
      ww_s3ka :: GHC.Prim.Int#
      ww_s3ka = GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# w_s3kc) } in
    jump $j_s3kr ww_s3ka;
  1# -> jump $j_s3kr ww_s3kh
}

then

case GHC.Prim.>=# w_s3kc 64# of {
  __DEFAULT ->
    case w_s3kc of wild_Xn {
      __DEFAULT ->
        $wgo_s3kj
          (GHC.Prim.+# wild_Xn 1#)
          ys_a3jc
          (GHC.Prim.orI# ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# w_s3kc));
      9223372036854775807# ->
        GHC.Types.I#
          (GHC.Prim.orI#
             ww_s3kh (GHC.Prim.uncheckedIShiftL# 1# 9223372036854775807#))
    };
  1# -> jump $j_s3kr ww_s3kh
}

and introduces an incorrect use of the primop.

At this point the branch with incorrect uncheckedIShiftL# can't be taken, but apparently the simplifier is not smart enough to drop the branch. I guess this is what @harpocrates meant in #16449 (comment 191376):

Since this code should anyways be unreachable, perhaps we could just replace invalid CMM shifts with a nop instruction?

I think the bug here is simplifier introducing incorrect use of the primop. But perhaps we're OK with introducing this, as long as we later eliminate the code? Not sure how to best approach this problem.

GHC 8.8 heap overflow regression

Child items ...

Activity

GHC 8.8 heap overflow regression

Relates to

Activity