Nail down semantics of unspecified results in Core

Exactly to what extent is the behavior of an expression like uncheckedShiftLWord32# x 99# undefined or unspecified?

Note that this expression is currently considered OK-for-speculation, meaning (among other things) that we are willing to evaluate it eagerly to avoid allocating a thunk, even if that thunk might itself never be evaluated. See also this discussion: !10097 (comment 492906)
- If we keep this OK-for-speculation, we must lower these to in the C and LLVM backends is really guaranteed to fulfill the NoEffect promise of "this primop will successfully return a valid (albeit potentially arbitrary) Word32#" in the event of over-large shift amounts.
This also has bearing on #23228: If uncheckedShiftRL# x#_auD 64# is not fully undefined behavior, it is nonsense for the corresponding Cmm to cause a lint error.
- Likewise we should lower these to something that is not undefined behavior in the C backend, and perhaps also freeze any poison it may generate in the LLVM backend. (Do we?)
For some prior art, LLVM IR handles this by making over-large shifts return what they call a poison value, which
- is not undefined behavior in itself (allowing speculative evaluation), but
- is propagated through most simple operations and
- it is undefined behavior to branch on a poison value.
- See also 1, 2
@sgraf812 suggests constant-folding such an expression to LitRubbish (See Note [Rubbish literals].). I think this is plausible, but the semantics here may also deserve to be nailed down a bit better.

Here's a list of several possible meanings that can be considered, roughly ordered by how many programs they allow to have observably non-deterministic behavior:

For y greater than 31, uncheckedShiftLWord32# x y produces some well-specified result.
- The most popular choice here is probably the result of left-shifting by the remainder of y mod 32; if memory serves, this is the choice made by both the ECMAScript specification and the x86 instruction set.
  - This is perfectly well-behaved, but it may require an extra bitwise-and instruction to implement on some hardware. (But that's much cheaper than a branch!)
For y greater than 31, uncheckedShiftLWord32# x y produces an arbitrary but essentially deterministic ordinary Word32# value, that may depend only on the particular values of x and y, the build compiler version, and the host platform.
- This is an extremely weak form of unspecified behavior. But it's very well-behaved, and doesn't cause nearly as much subtle trouble as the later options do!
- This behavior requires Core-to-Core to either not rewrite uncheckedShiftLWord32# x 99# or to explicitly model the behavior of the target platform. As such expressions are not common except in dead code, this is a small sacrifice.
- I think this behavior is efficiently implementable on most platforms. (TODO: Check this.) I don't think this behavior is efficiently implementable in LLVM IR or C without inline assembly.
For y greater than 31, uncheckedShiftLWord32# x y produces one arbitrary non-deterministic ordinary Word32# value. Different calls with the same values of x and y may produce different results even within the same program execution.
- In particular, case uncheckedShiftLWord32# x y of r -> addWord32# r r is guaranteed to produce an even number, while addWord32# (uncheckedShiftLWord32# x y) (uncheckedShiftLWord32# x y) can produce any single Word32#.
- With this semantics, we can speculate a call to uncheckedShiftLWord32#, but we cannot duplicate such a call unless we know its shift amount is in-bounds. So we would need to mark the primop as not cheap.
- This is essentially equivalent to the result of freeze on a poison or undef operand in LLVM IR.
As proposed by @sgraf812 in #23753 (comment 516690) :

A program which computes a "poison"/"unspecified" value at some point has defined behavior when substituting the poison value for an arbitrary inhabitant of its type (i.e., Word32#) leads to the same program behavior (bisimilarity of traces).
- This is almost the same as option 3 above, except that when-ever the nondeterminism introduced in option 3 would be externally observable, the result is undefined behavior in the sense of C or option 7 below. (@sgraf812 Feel free to edit this if I have misunderstood.)
- As with option 3, we cannot duplicate a call to uncheckedShiftLWord32# unless we know its shift amount is in-bounds. So we would need to mark the primop as not cheap. In particular, the expression case uncheckedShiftLWord32# x y of r -> even $ W32# (addWord32# r r) is equivalent to True while even $ W32# (addWord32# (uncheckedShiftLWord32# x y) (uncheckedShiftLWord32# x y)) is equivalent to a poison value.
Like undef in LLVM-IR:
- See also 1, 2
  - Roughly: Consider an abstract machine state in which each variable holds not just one normal value, but a nonempty set of normal values.
    - At runtime on real hardware the corresponding register or memory location will hold one arbitrary representative from this set.
  - Normal operations are lifted to operate on sets of normal value in the "obvious" way. For example, the result of adding x and y is just { vx + vy | vx <- x, vy <- y}, where + is the addition operation on single normal values.
    - In particular, the result of adding x with itself is { vx1 + vx2 | vx1 <- x, vx2 <- x} and not { vx + vx | vx <- x}.
    - This recovers referential transparency even in the presence of multiple occurrences of a variable. In particular, unlike options 3 and 4 above, using something like LLVM-IR's undef allows us to safely duplicate calls to uncheckedShiftLWord32#.
  - undef at a type is just the set of all normal values of that type.
- This has a lot of tricky effects on program transformations! See section 3 of the paper I linked for a sense of the matter.
- The question of whether branch-on-undef should be immediate undefined behavior or just a non-deterministic choice (and of what sort) seems especially worth pondering.
Like poison in LLVM-IR:
- See also 1, 2
  - In short, almost every operation in which one operand is poison will return a poison result. This includes things like 0## `leWord#` poison even though every normal Word# is at least zero so that 0## `leWord#` undef is equivalent to 1#.
- Consider the expression int16ToInt# (intToInt16_no_overflow# x). We might like to unconditionally rewrite this to x.
  - If intToInt16_no_overflow# returns undef on overflow, this transformation is incorrect, since int16ToInt# undef is the set of all normal Int# values that fit into an Int16#, and does not contain every normal Int# value. (In particular, the rewrite will change behavior when x is 50000#.)
  - But if intToInt16_no_overflow# returns poison on overflow, we can perform this rewrite, since int16ToInt# poison is also poison (albeit of a different type), which is the least-specified possible Int# value.
For y greater than 31, uncheckedShiftLWord32# x y results in immediate undefined behavior if executed, i.e. if a program evaluates such an expression, any program behavior is acceptable.
- This semantics would require marking uncheckedShiftLWord32# as CanFail, preventing speculative evaluation of the primop. Every other option considered will limit allowed program behaviors in some ways so that speculative evaluation is safe!

Edited Mar 02, 2024 by Matthew Craven

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information