How untagged pointers sneak into banged fields
(N.B. I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!)
Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields. This reproduction recipe **depends on the patch from #14677 (closed) applied**.
We have 3 modules A
, B
and C
(SG: Note that the newtype wrapper has become redundant, probably in the aftermath of #15696 (closed). I could reproduce the bug without it):
module A where
data A = X | Y | Z
a = Z
module B where
import A
newtype B = B A
b = B a
{-# language MagicHash #-}
module C where
import A
import B
import GHC.Exts
data C = C !B
c = C b
main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink
print (I# (dataToTag# c)) -- prints 0: not entered yet
print (case c of C b' -> I# (dataToTag# b')) -- used to print 0? After #15696 it prints 2
print (case c of C (B a') -> I# (dataToTag# a')) -- used to print 3. After #15696 it prints 2
== Why this happens
B.b
is a newtype to A.a
so one would expect that both alias the same memory location (a hardlink in filesystem parlance). But currently reexports are implemented with a special type of closure IND_STATIC
(a softlink) which needs to be entered to obtain the actual (tagged pointer). The IND_STATIC
closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness A.A
, which causes the symptoms seen in #14677 (closed)).
With #14677 (closed) applied to GHC, the unfolding of B.b
is correctly read when compiling C
(with -O1
and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of C.c
shortcuts the entering of B.b
when filling the strict field, and (because B.b
being a softlink, thus untagged) the field ends up carrying a 0 tag.
== How can this be fixed?
I see two possibilities one conservative and one invasive.
Conservative
When seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an IND_STATIC
closure. Check the closure type, and if confirmed, steal the pointee and use that.
Invasive
Get rid of the IND_STATIC
closures altogether. For intra-module softlinks we can have proper hardlinks (assembler .equiv
directives, or LLVM alias
es). Inter-module softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out.
OTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code.
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | alexbiehl |
Operating system | |
Architecture |