Skip to content

How untagged pointers sneak into banged fields

(N.B. I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!)

Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields. This reproduction recipe **depends on the patch from #14677 (closed) applied**.

We have 3 modules A, B and C (SG: Note that the newtype wrapper has become redundant, probably in the aftermath of #15696 (closed). I could reproduce the bug without it):

module A where
data A = X | Y | Z
a = Z
module B where
import A
newtype B = B A
b = B a
{-# language MagicHash #-}

module C where
import A
import B
import GHC.Exts

data C = C !B
c = C b

main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink
          print (I# (dataToTag# c)) -- prints 0: not entered yet
          print (case c of C b' -> I# (dataToTag# b')) -- used to print 0? After #15696 it prints 2
          print (case c of C (B a') -> I# (dataToTag# a')) -- used to print 3. After #15696 it prints 2

== Why this happens

B.b is a newtype to A.a so one would expect that both alias the same memory location (a hardlink in filesystem parlance). But currently reexports are implemented with a special type of closure IND_STATIC (a softlink) which needs to be entered to obtain the actual (tagged pointer). The IND_STATIC closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness A.A, which causes the symptoms seen in #14677 (closed)).

With #14677 (closed) applied to GHC, the unfolding of B.b is correctly read when compiling C (with -O1 and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of C.c shortcuts the entering of B.b when filling the strict field, and (because B.b being a softlink, thus untagged) the field ends up carrying a 0 tag.


== How can this be fixed?

I see two possibilities one conservative and one invasive.

Conservative

When seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an IND_STATIC closure. Check the closure type, and if confirmed, steal the pointee and use that.

Invasive

Get rid of the IND_STATIC closures altogether. For intra-module softlinks we can have proper hardlinks (assembler .equiv directives, or LLVM aliases). Inter-module softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out.

OTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code.

Trac metadata
Trac field Value
Version 8.4.2
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC alexbiehl
Operating system
Architecture
Edited by Sebastian Graf
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information