GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2022-03-07T11:52:47Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/2731Avoid unnecessary evaluation when unpacking constructors2022-03-07T11:52:47ZSimon Peyton JonesAvoid unnecessary evaluation when unpacking constructorsConsider
```
data T a = MkT !a
foo :: T (a,b) -> a
foo (MkT (x,y)) = x
```
GHC will extract the first component of the `MkT`, *evaluate it*, and then extract the first component of the pair. The evaluation step isn't needed, since the...Consider
```
data T a = MkT !a
foo :: T (a,b) -> a
foo (MkT (x,y)) = x
```
GHC will extract the first component of the `MkT`, *evaluate it*, and then extract the first component of the pair. The evaluation step isn't needed, since the component is known to be already-evaluated. `UNPACK` directives won't work here, because the component is polymorphic.
In the email thread, Tyson posted an example where this extra eval made a significant difference to his inner loop:
[http://www.haskell.org/pipermail/glasgow-haskell-users/2008-October/015796.html](http://www.haskell.org/pipermail/glasgow-haskell-users/2008-October/015796.html)
Simon
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.8.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Avoid unnecessary evaluation when unpacking constructors","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"6.12 branch","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.8.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"\r\nConsider \r\n{{{\r\ndata T a = MkT !a\r\n\r\nfoo :: T (a,b) -> a\r\nfoo (MkT (x,y)) = x\r\n}}}\r\nGHC will extract the first component of the `MkT`, ''evaluate it'', and then extract the first component of the pair. The evaluation step isn't needed, since the component is known to be already-evaluated. `UNPACK` directives won't work here, because the component is polymorphic.\r\n\r\nIn the email thread, Tyson posted an example where this extra eval made a significant difference to his inner loop:\r\n[http://www.haskell.org/pipermail/glasgow-haskell-users/2008-October/015796.html]\r\n\r\nSimon\r\n\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/9660unnecessary indirect jump when returning a case scrutinee2021-03-16T14:38:51Zrwbartonunnecessary indirect jump when returning a case scrutineeI happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)
```hs
f :: Int -> Int
f x = if x < 0 then x else x+1
```
and I noticed something a bit funny about it:
```
c12e:
if ((Sp + -8) < SpLim) goto c12z; el...I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)
```hs
f :: Int -> Int
f x = if x < 0 then x else x+1
```
and I noticed something a bit funny about it:
```
c12e:
if ((Sp + -8) < SpLim) goto c12z; else goto c12A;
c12z:
R2 = R2;
R1 = Test.f_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
c12A:
I64[Sp - 8] = c12b;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c12b; else goto c12c;
c12c:
call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8;
c12b:
Hp = Hp + 16;
if (Hp > HpLim) goto c12y; else goto c12x;
c12y:
HpAlloc = 16;
R1 = R1;
call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd: 8;
c12x:
_s11Q::I64 = I64[R1 + 7];
if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v;
c12u:
Hp = Hp - 16;
R1 = R1 & (-8); /* <--- */
Sp = Sp + 8;
call (I64[R1])(R1) args: 8, res: 0, upd: 8; /* <--- */
c12v:
I64[Hp - 8] = GHC.Types.I#_con_info;
I64[Hp] = _s11Q::I64 + 1;
R1 = Hp - 7;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
```
On the two marked lines, we untag R1 (which is `x`) and enter it. However, we know at this point that `x` is already in WHNF so we could simply return it by replacing the two lines with `call (P64[Sp])(R1)`, if I'm not mistaken. That will save a load and an indirect jump (which we actually know is to `I#_con_info`, which would just retag R1 and return to the address on the stack anyways).
I think the same optimization should be available any time we do an algebraic `case` and in a branch simply return the scrutinee.
I looked at what it would take to fix this. It looks almost easy: if we add a new `LambdaFormInfo` constructor `LFUnknownCon` meaning that we know the identifier is bound to a saturated application of an unknown constructor, then we could set the `cg_lf` of the case binder variable of an algebraic case statement to `LFUnknownCon`, and return `ReturnIt` for `LFUnknownCon` variables in `getCallMethod`. I think that would do it. Does that sound right? Is there a better way?
(In my original example we actually know the constructor has to be `I#`. But if the case was on a type with more than one constructor we wouldn't know statically which one we got, just that it has to be one of them.)
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------ |
| Version | 7.8.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (CodeGen) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | simonmar |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"unnecessary indirect jump when returning a case scrutinee","status":"New","operating_system":"","component":"Compiler (CodeGen)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.8.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["simonmar"],"type":"Bug","description":"I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)\r\n\r\n{{{#!hs\r\nf :: Int -> Int\r\nf x = if x < 0 then x else x+1\r\n}}}\r\n\r\nand I noticed something a bit funny about it:\r\n\r\n\r\n{{{\r\n c12e:\r\n if ((Sp + -8) < SpLim) goto c12z; else goto c12A;\r\n c12z:\r\n R2 = R2;\r\n R1 = Test.f_closure;\r\n call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;\r\n c12A:\r\n I64[Sp - 8] = c12b;\r\n R1 = R2;\r\n Sp = Sp - 8;\r\n if (R1 & 7 != 0) goto c12b; else goto c12c;\r\n c12c:\r\n call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8;\r\n c12b:\r\n Hp = Hp + 16;\r\n if (Hp > HpLim) goto c12y; else goto c12x;\r\n c12y:\r\n HpAlloc = 16;\r\n R1 = R1;\r\n call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd: 8;\r\n c12x:\r\n _s11Q::I64 = I64[R1 + 7];\r\n if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v;\r\n c12u:\r\n Hp = Hp - 16;\r\n R1 = R1 & (-8); /* <--- */\r\n Sp = Sp + 8;\r\n call (I64[R1])(R1) args: 8, res: 0, upd: 8; /* <--- */\r\n c12v:\r\n I64[Hp - 8] = GHC.Types.I#_con_info;\r\n I64[Hp] = _s11Q::I64 + 1;\r\n R1 = Hp - 7;\r\n Sp = Sp + 8;\r\n call (P64[Sp])(R1) args: 8, res: 0, upd: 8;\r\n}}}\r\n\r\nOn the two marked lines, we untag R1 (which is `x`) and enter it. However, we know at this point that `x` is already in WHNF so we could simply return it by replacing the two lines with `call (P64[Sp])(R1)`, if I'm not mistaken. That will save a load and an indirect jump (which we actually know is to `I#_con_info`, which would just retag R1 and return to the address on the stack anyways).\r\n\r\nI think the same optimization should be available any time we do an algebraic `case` and in a branch simply return the scrutinee.\r\n\r\nI looked at what it would take to fix this. It looks almost easy: if we add a new `LambdaFormInfo` constructor `LFUnknownCon` meaning that we know the identifier is bound to a saturated application of an unknown constructor, then we could set the `cg_lf` of the case binder variable of an algebraic case statement to `LFUnknownCon`, and return `ReturnIt` for `LFUnknownCon` variables in `getCallMethod`. I think that would do it. Does that sound right? Is there a better way?\r\n\r\n(In my original example we actually know the constructor has to be `I#`. But if the case was on a type with more than one constructor we wouldn't know statically which one we got, just that it has to be one of them.)","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/10606avoid redundant stores to the stack when examining already-tagged data2021-11-07T12:36:25Zrwbartonavoid redundant stores to the stack when examining already-tagged dataGHC compiles a function that performs case analysis on a value of an ADT like
```
bool :: a -> a -> Bool -> a
bool f t b = case b of
False -> f
True -> t
```
to Cmm of the form
```
{offset
cwV:
if ((Sp + -2...GHC compiles a function that performs case analysis on a value of an ADT like
```
bool :: a -> a -> Bool -> a
bool f t b = case b of
False -> f
True -> t
```
to Cmm of the form
```
{offset
cwV:
if ((Sp + -24) < SpLim) goto cwW; else goto cwX;
cwW:
R4 = R4;
R3 = R3;
R2 = R2;
R1 = Bool.bool_closure;
call (stg_gc_fun)(R4, R3, R2, R1) args: 8, res: 0, upd: 8;
cwX:
I64[Sp - 24] = cwL; -- (*)
R1 = R4;
P64[Sp - 16] = R2; -- (†1)
P64[Sp - 8] = R3; -- (†2)
Sp = Sp - 24; -- (‡)
if (R1 & 7 != 0) goto cwL; else goto cwM;
cwM:
call (I64[R1])(R1) returns to cwL, args: 8, res: 8, upd: 8;
cwL:
if (R1 & 7 >= 2) goto cwT; else goto cwU;
cwT:
R1 = P64[Sp + 16];
Sp = Sp + 24;
call stg_ap_0_fast(R1) args: 8, res: 0, upd: 8;
cwU:
R1 = P64[Sp + 8];
Sp = Sp + 24;
call stg_ap_0_fast(R1) args: 8, res: 0, upd: 8;
}
```
Statement (\*) stores a return address for the evaluation of `b` to return to, and statements (†1), (†2) save local variables that are live in case alternatives, since they cannot be held in registers across the evaluation of `b`. But in the event that `b` is already evaluated and represented by a tagged pointer, all these stores are unnecessary: the return address written by (\*) is simply dead, and the values saved in (†1), (†2) are still available in whatever locations they were copied to the stack from.
In many cases the data we examine is mostly tagged, and while the active part of the stack is likely to be in L1 cache, the cost of these stores and reads is probably still positive (barring secondary effects from changes to pipelining, branch prediction, and so on).
In this case we could certainly move the return address store (\*) into block `cwM`, and possibly move the local variable stores (†1), (†2) into `cwM` as well, though it's then not clear to me how to recover the values in the alternatives (does Cmm have something like phi nodes?) I don't propose to move the statement (‡), as arithmetic on registers is essentially free anyways.
I tried implementing the part of this pertaining to the return address (\*) and ran into two complications.
- For some reason, when I moved the return address store (\*) into the "data is not tagged" branch in the Stg-\>Cmm translation, this also resulted in both the local variable stores (†1), (†2) and the update to Sp (‡) being sunk into both branches of the "is the data tagged" conditional at some point in the Cmm optimization pipeline. This was useless since they couldn't be pushed further past the branch on the returned tag value, so the result was enlarged code size that outweighed the savings of avoiding a single store. I didn't investigate exactly why this sinking was dependent on the location of the store (\*), but this should be fixable.
- There may be heap checks in the alternatives. In that case, the code generator currently cleverly reuses the stack frame and info table set up for the evaluation of `b` in the heap failure branches. If we move some of the stores (\*), (†1), (†2) into the evaluation branch `cwM`, then we either have to duplicate them in heap failure branches, or set up a new stack frame and info table, or do some other clever thing. Or in the worst case, only do this optimization when performing the heap check before the case (which may then become slightly more attractive).
I'm attaching the current version of my patch mainly for my own future reference; it seems to produce correct, but larger and marginally slower code, I believe for the reasons described above.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------ |
| Version | 7.11 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (CodeGen) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | simonmar |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"avoid redundant stores to the stack when examining already-tagged data","status":"New","operating_system":"","component":"Compiler (CodeGen)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.11","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["simonmar"],"type":"FeatureRequest","description":"GHC compiles a function that performs case analysis on a value of an ADT like\r\n{{{\r\nbool :: a -> a -> Bool -> a\r\nbool f t b = case b of\r\n False -> f\r\n True -> t\r\n}}}\r\nto Cmm of the form\r\n{{{\r\n {offset\r\n cwV:\r\n if ((Sp + -24) < SpLim) goto cwW; else goto cwX;\r\n cwW:\r\n R4 = R4;\r\n R3 = R3;\r\n R2 = R2;\r\n R1 = Bool.bool_closure;\r\n call (stg_gc_fun)(R4, R3, R2, R1) args: 8, res: 0, upd: 8;\r\n cwX:\r\n I64[Sp - 24] = cwL; -- (*)\r\n R1 = R4;\r\n P64[Sp - 16] = R2; -- (†1)\r\n P64[Sp - 8] = R3; -- (†2)\r\n Sp = Sp - 24; -- (‡)\r\n if (R1 & 7 != 0) goto cwL; else goto cwM;\r\n cwM:\r\n call (I64[R1])(R1) returns to cwL, args: 8, res: 8, upd: 8;\r\n cwL:\r\n if (R1 & 7 >= 2) goto cwT; else goto cwU;\r\n cwT:\r\n R1 = P64[Sp + 16];\r\n Sp = Sp + 24;\r\n call stg_ap_0_fast(R1) args: 8, res: 0, upd: 8;\r\n cwU:\r\n R1 = P64[Sp + 8];\r\n Sp = Sp + 24;\r\n call stg_ap_0_fast(R1) args: 8, res: 0, upd: 8;\r\n }\r\n}}}\r\nStatement (*) stores a return address for the evaluation of `b` to return to, and statements (†1), (†2) save local variables that are live in case alternatives, since they cannot be held in registers across the evaluation of `b`. But in the event that `b` is already evaluated and represented by a tagged pointer, all these stores are unnecessary: the return address written by (*) is simply dead, and the values saved in (†1), (†2) are still available in whatever locations they were copied to the stack from.\r\n\r\nIn many cases the data we examine is mostly tagged, and while the active part of the stack is likely to be in L1 cache, the cost of these stores and reads is probably still positive (barring secondary effects from changes to pipelining, branch prediction, and so on).\r\n\r\nIn this case we could certainly move the return address store (*) into block `cwM`, and possibly move the local variable stores (†1), (†2) into `cwM` as well, though it's then not clear to me how to recover the values in the alternatives (does Cmm have something like phi nodes?) I don't propose to move the statement (‡), as arithmetic on registers is essentially free anyways.\r\n\r\nI tried implementing the part of this pertaining to the return address (*) and ran into two complications.\r\n\r\n* For some reason, when I moved the return address store (*) into the \"data is not tagged\" branch in the Stg->Cmm translation, this also resulted in both the local variable stores (†1), (†2) and the update to Sp (‡) being sunk into both branches of the \"is the data tagged\" conditional at some point in the Cmm optimization pipeline. This was useless since they couldn't be pushed further past the branch on the returned tag value, so the result was enlarged code size that outweighed the savings of avoiding a single store. I didn't investigate exactly why this sinking was dependent on the location of the store (*), but this should be fixable.\r\n\r\n* There may be heap checks in the alternatives. In that case, the code generator currently cleverly reuses the stack frame and info table set up for the evaluation of `b` in the heap failure branches. If we move some of the stores (*), (†1), (†2) into the evaluation branch `cwM`, then we either have to duplicate them in heap failure branches, or set up a new stack frame and info table, or do some other clever thing. Or in the worst case, only do this optimization when performing the heap check before the case (which may then become slightly more attractive).\r\n\r\nI'm attaching the current version of my patch mainly for my own future reference; it seems to produce correct, but larger and marginally slower code, I believe for the reasons described above.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/14373Introduce PTR-tagging for big constructor families2021-01-22T19:58:18ZGabor GreifIntroduce PTR-tagging for big constructor familiesCurrently only *small* constructor families come into the benefit of pointer tagging.
In big fams, 1 is the tag that says "I am evaluated". I suggest to do best-effort pointer tagging on big families too by this scheme:
Ptr-tag 1..6 fo...Currently only *small* constructor families come into the benefit of pointer tagging.
In big fams, 1 is the tag that says "I am evaluated". I suggest to do best-effort pointer tagging on big families too by this scheme:
Ptr-tag 1..6 for the first 6 constructors, 7 would signify "look into the info table and branch on that tag". In the info table the tags will then be 6..(familySize - 1).
I have an implementation which I'll submit to fabricator soon.
TODOs: update wiki pages.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Introduce PTR-tagging for big constructor families","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Currently only ''small'' constructor families come into the benefit of pointer tagging.\r\n\r\nIn big fams, 1 is the tag that says \"I am evaluated\". I suggest to do best-effort pointer tagging on big families too by this scheme:\r\n\r\nPtr-tag 1..6 for the first 6 constructors, 7 would signify \"look into the info table and branch on that tag\". In the info table the tags will then be 6..(familySize - 1).\r\n\r\nI have an implementation which I'll submit to fabricator soon.\r\n\r\nTODOs: update wiki pages.","type_of_failure":"OtherFailure","blocking":[]} -->Gabor GreifGabor Greifhttps://gitlab.haskell.org/ghc/ghc/-/issues/14626No need to enter a scrutinised value2022-05-04T14:55:55ZGabor GreifNo need to enter a scrutinised valueWhile analysing the output of #13861 I stumbled over an unnecessary pessimisation in handling of scrutinised values. With words of Simon (from https://phabricator.haskell.org/D4267 with minor edits added):
Interesting. Yes, please make ...While analysing the output of #13861 I stumbled over an unnecessary pessimisation in handling of scrutinised values. With words of Simon (from https://phabricator.haskell.org/D4267 with minor edits added):
Interesting. Yes, please make a ticket! (And transfer the info below into it.)
I think the issue is this. Given (the STG-ish code)
```hs
data Colour = Red | Green | Blue
f x = case x of y
Red -> Green
DEFAULT -> y
```
(here `y` is the case binder) we can just return `x` rather than entering it in DEFAULT branch, because `y` will be fully evaluated and its pointer will be correctly tagged.
You absolutely can't check for an `OccName` of `"wild"`!! That is neither necessary nor sufficient :-).
Instead, check `isEvaldUnfolding (idUnfolding y)`. See `Note [Preserve evaluatedness]` in `CoreTidy.hs`. And be sure to augment that note if you make the change.
I would expect perf benefits to be small on average, but it's simple to implement.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | #13861 |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"No need to enter a scrutinised value","status":"New","operating_system":"","component":"Compiler","related":[13861],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.2","keywords":["performance"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"While analysing the output of #13861 I stumbled over an unnecessary pessimisation in handling of scrutinised values. With words of Simon (from https://phabricator.haskell.org/D4267 with minor edits added):\r\n\r\nInteresting. Yes, please make a ticket! (And transfer the info below into it.)\r\n\r\nI think the issue is this. Given (the STG-ish code)\r\n{{{#!hs\r\ndata Colour = Red | Green | Blue\r\nf x = case x of y\r\n Red -> Green\r\n DEFAULT -> y\r\n}}}\r\n(here `y` is the case binder) we can just return `x` rather than entering it in DEFAULT branch, because `y` will be fully evaluated and its pointer will be correctly tagged.\r\n\r\n\r\nYou absolutely can't check for an `OccName` of `\"wild\"`!! That is neither necessary nor sufficient :-).\r\n\r\nInstead, check `isEvaldUnfolding (idUnfolding y)`. See `Note [Preserve evaluatedness]` in `CoreTidy.hs`. And be sure to augment that note if you make the change.\r\n\r\nI would expect perf benefits to be small on average, but it's simple to implement.","type_of_failure":"OtherFailure","blocking":[]} -->Gabor GreifGabor Greifhttps://gitlab.haskell.org/ghc/ghc/-/issues/14677Code generator does not correctly tag a pointer2022-05-04T14:55:55ZSimon Peyton JonesCode generator does not correctly tag a pointerSee also #15155, #16559
Consider
```
data T a = MkT ![a]
```
The pointer stored in a `MkT` constructor should always be correctly tagged, never tagged with un-evaluated 00. C.f. [Commentary/Rts/HaskellExecution/PointerTagging](https:/...See also #15155, #16559
Consider
```
data T a = MkT ![a]
```
The pointer stored in a `MkT` constructor should always be correctly tagged, never tagged with un-evaluated 00. C.f. [Commentary/Rts/HaskellExecution/PointerTagging](https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/haskell-execution/pointer-tagging)
But this invariant is broken. Example taken from #14626, #14677-39.
Trac14626_1.hs
```
module Trac14626_1 where
data Style = UserStyle Int
| PprDebug
data SDC = SDC !Style !Int
defaultUserStyle :: Bool -> Style
defaultUserStyle True = UserStyle 123
defaultUserStyle False = PprDebug
```
Trac14626_2.hs
```
module Trac14626_2 where
import Trac14626_1
f :: Int -> SDC
f x = SDC (defaultUserStyle (x > 1)) x
```
Compiling with `ghc Trac14626_1 Trac14626_2 -ddump-simpl -O` results in a similar scenario than the one described by Heisenbug:
```
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle2
defaultUserStyle2 = I# 123#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle1
defaultUserStyle1 = UserStyle defaultUserStyle2
-- RHS size: {terms: 7, types: 2, coercions: 0, joins: 0/0}
defaultUserStyle
defaultUserStyle
= \ ds_dZ7 ->
case ds_dZ7 of {
False -> PprDebug;
True -> defaultUserStyle1
}
```
Our `UserStyle 123` constant has been lifted to top-level, just like in Heisenbugs example.
Now looking at the Core of `f`
```
f
f = \ x_a1dk ->
case x_a1dk of { I# x1_a2gV ->
case ># x1_a2gV 1# of {
__DEFAULT -> SDC PprDebug x1_a2gV;
1# -> SDC defaultUserStyle1 x1_a2gV
}
}
```
(Note how `f` doesn't scrutinise defaultUserStyle1)
Looking at the CMM for `f` we can see
```
...
if (%MO_S_Le_W64(_s2hT::I64, 1)) goto c2ip; else goto c2is;
c2ip:
I64[Hp - 16] = SDC_con_info;
P64[Hp - 8] = PprDebug_closure+2;
I64[Hp] = _s2hT::I64;
R1 = Hp - 15;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c2is:
I64[Hp - 16] = SDC_con_info;
P64[Hp - 8] = defaultUserStyle1_closure; -- defaultUserStyle1 isn't tagged!
I64[Hp] = _s2hT::I64;
R1 = Hp - 15;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
```
When generating code for f the code generator wants to know the `LambdaFormInfo` (the closure type) of `defaultUserStyle1`.
Since `defaultUserStyle1` is defined in another module we end up calling `mkLFImported` in `StgCmmClosure` which ultimatively gives an `LFUnknown` which always gets a `DynTag` 0 from `lfDynTag`.
I think we lack a bit of information here to give defaultUserStyle1 the correct `LFCon` lambda form. Maybe top-level binders should know its `LambdaForm` and include them in their interfaces.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Code generator does not correctly tag a pointer","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Consider\r\n{{{\r\ndata T a = MkT ![a]\r\n}}}\r\nThe pointer stored in a `MkT` constructor should always be correctly tagged, never tagged with un-evaluated 00. C.f. [wiki:Commentary/Rts/HaskellExecution/PointerTagging]\r\n\r\nBut this invariant is broken. Example taken from #14626, comment:37-39.\r\n\r\nTrac14626_1.hs\r\n{{{\r\nmodule Trac14626_1 where\r\n\r\ndata Style = UserStyle Int\r\n | PprDebug\r\n\r\ndata SDC = SDC !Style !Int\r\n\r\ndefaultUserStyle :: Bool -> Style\r\ndefaultUserStyle True = UserStyle 123\r\ndefaultUserStyle False = PprDebug\r\n}}}\r\n\r\nTrac14626_2.hs\r\n{{{\r\nmodule Trac14626_2 where\r\n\r\nimport Trac14626_1\r\n\r\nf :: Int -> SDC\r\nf x = SDC (defaultUserStyle (x > 1)) x\r\n}}}\r\n\r\nCompiling with `ghc Trac14626_1 Trac14626_2 -ddump-simpl -O` results in a similar scenario than the one described by Heisenbug:\r\n\r\n{{{\r\n-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}\r\ndefaultUserStyle2\r\ndefaultUserStyle2 = I# 123#\r\n\r\n-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}\r\ndefaultUserStyle1\r\ndefaultUserStyle1 = UserStyle defaultUserStyle2\r\n\r\n-- RHS size: {terms: 7, types: 2, coercions: 0, joins: 0/0}\r\ndefaultUserStyle\r\ndefaultUserStyle\r\n = \\ ds_dZ7 ->\r\n case ds_dZ7 of {\r\n False -> PprDebug;\r\n True -> defaultUserStyle1\r\n }\r\n}}}\r\n\r\nOur `UserStyle 123` constant has been lifted to top-level, just like in Heisenbugs example. \r\n\r\nNow looking at the Core of `f`\r\n\r\n{{{\r\nf\r\nf = \\ x_a1dk ->\r\n case x_a1dk of { I# x1_a2gV ->\r\n case ># x1_a2gV 1# of {\r\n __DEFAULT -> SDC PprDebug x1_a2gV;\r\n 1# -> SDC defaultUserStyle1 x1_a2gV\r\n }\r\n }\r\n}}}\r\n\r\n(Note how `f` doesn't scrutinise defaultUserStyle1) \r\n\r\nLooking at the CMM for `f` we can see\r\n\r\n{{{\r\n ... \r\n if (%MO_S_Le_W64(_s2hT::I64, 1)) goto c2ip; else goto c2is;\r\n c2ip:\r\n I64[Hp - 16] = SDC_con_info;\r\n P64[Hp - 8] = PprDebug_closure+2;\r\n I64[Hp] = _s2hT::I64;\r\n R1 = Hp - 15;\r\n Sp = Sp + 8;\r\n call (P64[Sp])(R1) args: 8, res: 0, upd: 8;\r\n c2is:\r\n I64[Hp - 16] = SDC_con_info;\r\n P64[Hp - 8] = defaultUserStyle1_closure; -- defaultUserStyle1 isn't tagged!\r\n I64[Hp] = _s2hT::I64;\r\n R1 = Hp - 15;\r\n Sp = Sp + 8;\r\n call (P64[Sp])(R1) args: 8, res: 0, upd: 8;\r\n\r\n}}}\r\n\r\nWhen generating code for f the code generator wants to know the `LambdaFormInfo` (the closure type) of `defaultUserStyle1`. \r\n\r\nSince `defaultUserStyle1` is defined in another module we end up calling `mkLFImported` in `StgCmmClosure` which ultimatively gives an `LFUnknown` which always gets a `DynTag` 0 from `lfDynTag`. \r\n\r\nI think we lack a bit of information here to give defaultUserStyle1 the correct `LFCon` lambda form. Maybe top-level binders should know its `LambdaForm` and include them in their interfaces. \r\n\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/15155How untagged pointers sneak into banged fields2023-07-14T14:44:18ZGabor GreifHow untagged pointers sneak into banged fields(*N.B.* I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!)
Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields....(*N.B.* I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!)
Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields.
This reproduction recipe \*\*depends on the patch from #14677 applied\*\*.
We have 3 modules `A`, `B` and `C` (SG: Note that the newtype wrapper has become redundant, probably in the aftermath of #15696. I could reproduce the bug without it):
```hs
module A where
data A = X | Y | Z
a = Z
```
```hs
module B where
import A
newtype B = B A
b = B a
```
```hs
{-# language MagicHash #-}
module C where
import A
import B
import GHC.Exts
data C = C !B
c = C b
main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink
print (I# (dataToTag# c)) -- prints 0: not entered yet
print (case c of C b' -> I# (dataToTag# b')) -- used to print 0? After #15696 it prints 2
print (case c of C (B a') -> I# (dataToTag# a')) -- used to print 3. After #15696 it prints 2
```
-------------------
== Why this happens
`B.b` is a newtype to `A.a` so one would expect that both alias the same memory location (a *hardlink* in filesystem parlance). But currently reexports are implemented with a special type of closure `IND_STATIC` (a *softlink*) which needs to be entered to obtain the actual (tagged pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness `A.A`, which causes the symptoms seen in #14677).
With #14677 applied to GHC, the unfolding of `B.b` is correctly read when compiling `C` (with `-O1` and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of `C.c` shortcuts the entering of `B.b` when filling the strict field, and (because `B.b` being a softlink, thus untagged) the field ends up carrying a 0 tag.
-------------------------
== How can this be fixed?
I see two possibilities one conservative and one invasive.
### Conservative
When seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an `IND_STATIC` closure. Check the closure type, and if confirmed, steal the pointee and use that.
### Invasive
Get rid of the `IND_STATIC` closures altogether. For *intra-module* softlinks we can have proper hardlinks (assembler `.equiv` directives, or LLVM `alias`es). *Inter-module* softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out.
OTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | alexbiehl |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"How untagged pointers sneak into banged fields","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["alexbiehl"],"type":"Bug","description":"(''N.B.'' I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!)\r\n\r\nHere is a way how untagged pointers to strict data can be created in banged (strict) constructor fields.\r\nThis reproduction recipe **depends on the patch from #14677 applied**.\r\n\r\nWe have 3 modules `A`, `B` and `C`:\r\n\r\n{{{#!hs\r\nmodule A where\r\ndata A = X | Y | Z\r\na = Z\r\n}}}\r\n\r\n{{{#!hs\r\nmodule B where\r\nimport A\r\nnewtype B = B A\r\nb = B a\r\n}}}\r\n\r\n{{{#!hs\r\n{-# language MagicHash #-}\r\n\r\nmodule C where\r\nimport A\r\nimport B\r\nimport GHC.Exts\r\n\r\ndata C = C !B\r\nc = C b\r\n\r\nmain = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink\r\n print (I# (dataToTag# c)) -- prints 0: not entered yet\r\n print (case c of C b' -> I# (dataToTag# b')) -- prints 0?\r\n print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3\r\n}}}\r\n\r\n-------------------\r\n== Why this happens\r\n\r\n`B.b` is a newtype to `A.a` so one would expect that both alias the same memory location (a ''hardlink'' in filesystem parlance). But currently reexports are implemented with a special type of closure `IND_STATIC` (a ''softlink'') which needs to be entered to obtain the actual (tagged pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness `A.A`, which causes the symptoms seen in #14677).\r\n\r\nWith #14677 applied to GHC, the unfolding of `B.b` is correctly read when compiling `C` (with `-O1` and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of `C.c` shortcuts the entering of `B.b` when filling the strict field, and (because `B.b` being a softlink, thus untagged) the field ends up carrying a 0 tag.\r\n\r\n-------------------------\r\n== How can this be fixed?\r\n\r\nI see two possibilities one conservative and one invasive.\r\n\r\n=== Conservative\r\n\r\nWhen seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an `IND_STATIC` closure. Check the closure type, and if confirmed, steal the pointee and use that.\r\n\r\n=== Invasive\r\n\r\nGet rid of the `IND_STATIC` closures altogether. For ''intra-module'' softlinks we can have proper hardlinks (assembler `.equiv` directives, or LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out.\r\n\r\nOTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code.\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->Gabor GreifGabor Greifhttps://gitlab.haskell.org/ghc/ghc/-/issues/16820Omit the pointer tagging zero check for fields for values known to be strict2019-07-30T17:49:02ZAndrew MartinOmit the pointer tagging zero check for fields for values known to be strictConsider the following module:
```
{-# language MagicHash #-}
{-# OPTIONS_GHC -O2 -fforce-recomp #-}
module Strict
( total
) where
import GHC.Exts
data Numbers = NumbersCons Int# !Numbers | NumbersNil
total :: Numbers -> Int#
t...Consider the following module:
```
{-# language MagicHash #-}
{-# OPTIONS_GHC -O2 -fforce-recomp #-}
module Strict
( total
) where
import GHC.Exts
data Numbers = NumbersCons Int# !Numbers | NumbersNil
total :: Numbers -> Int#
total nums = case nums of
NumbersNil -> 0#
nums' -> totalInner 0# nums'
totalInner :: Int# -> Numbers -> Int#
totalInner acc (NumbersCons n ns) = totalInner (acc +# n) ns
totalInner acc NumbersNil = acc
```
The second argument to `totalInner` is always evaluated to WHNF before it is scrutinized by the function. However, GHC does not take advantage of this. In the generated cmm, we see (I've added extra comments for my own benefit):
```
totalInner_rtW_entry() // [R3, R2]
{ info_tbls: [(c1h6,
label: block_c1h6_info
rep: StackRep [True]
srt: Nothing),
(c1hd,
label: totalInner_rtW_info
rep: HeapRep static { Fun {arity: 2 fun_type: ArgSpec 13} }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c1hd: // global
_s1gp::P64 = R3;
_s1go::I64 = R2;
if ((Sp + -16) >= SpLim) (likely: True) goto c1h3; else goto c1he;
c1h3: // global
I64[Sp - 16] = c1h6;
R1 = _s1gp::P64;
I64[Sp - 8] = _s1go::I64;
Sp = Sp - 16;
if (R1 & 7 != 0) goto c1h6; else goto c1h7;
c1h7: // global
// The Numbers value had not yet been evaluated. Evaluate it.
// Note: this cannot ever actually happen.
call (I64[R1])(R1) returns to c1h6, args: 8, res: 8, upd: 8;
c1h6: // global
_s1go::I64 = I64[Sp + 8];
if (R1 & 7 != 1) goto c1hb; else goto c1ha;
c1hb: // global
// Argument was NumbersNil
R1 = _s1go::I64;
Sp = Sp + 16;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c1ha: // global
// Argument was NumbersCons
Sp = Sp + 16;
_s1gp::P64 = P64[R1 + 7];
_s1go::I64 = _s1go::I64 + I64[R1 + 15];
goto c1h3;
c1he: // global
R3 = _s1gp::P64;
R2 = _s1go::I64;
R1 = totalInner_rtW_closure;
call (stg_gc_fun)(R3, R2, R1) args: 8, res: 0, upd: 8;
}
},
```
The check for the pointer tagging bits being zero is not needed. I suspect that it might to hard to teach this to GHC, but it would be nice if it were able to figure this out.https://gitlab.haskell.org/ghc/ghc/-/issues/16831Always inline code which would result in static indirections.2022-07-12T21:34:52ZAndreas KlebingerAlways inline code which would result in static indirections.# Motivation
Certain code will always be compiled to a static indirection.
In particular this is true if all a binding does is coercing a value to a different type like `b` in the example below.
This is bad since we have to follow the...# Motivation
Certain code will always be compiled to a static indirection.
In particular this is true if all a binding does is coercing a value to a different type like `b` in the example below.
This is bad since we have to follow the indirection at runtime, and can't short it out at compile time.
```haskell
module B where
import A (a)
newtype Foo = Foo Int
{-# NOINLINE b #-}
b = Foo a
------------------------------
module C where
import B
data C = C !Foo
c = C b
```
# Proposal
GHC in these cases should ignore the NOINLINE pragma and always force b to be inlined.
This would also help with avoiding untagged things ending up in strict fields. See #14677https://gitlab.haskell.org/ghc/ghc/-/issues/16970Introduce tag inference just before codegen.2022-06-15T18:44:19ZAndreas KlebingerIntroduce tag inference just before codegen.## Motivation
Codegen wise GHC treats ALL lifted references the same.
Independent if they represent:
* a case binder
* a value, having been been evaluated already.
* a reference which can be proven to be tagged.
The result is that for ...## Motivation
Codegen wise GHC treats ALL lifted references the same.
Independent if they represent:
* a case binder
* a value, having been been evaluated already.
* a reference which can be proven to be tagged.
The result is that for each of these cases we need to first check for the presence of a tag.
If no tag is present (only possible in case two!) we further need to enter the closure to get a
tagged reference directly to the value.
The end effect is wasted work checking for tags we can prove to be present, and wasted code size by generating branches which will never be taken.
## Proposal
The idea for an solution I am working one is rather simple. Right before we translate Stg to Cmm we run a pass on the STG AST which tries to determine whtther or not a given reference will be tagged at runtime.
For cases where we determine the reference to always be tagged we can skip checking for the presence of a tag and don't have to emit enter code.
For cases where we determine the reference to be always untagged we can directly enter a closure as well.
It seems rather simple, at least case binders and contents of strict fields are guaranteed to have been evaluated, so it seems straight forward to assume that they are also tagged.
### Strict fields
An important source of information about "has this been evaluated yet" are strict fields.
However while their contents are guaranteed to have been evaluated when casing on the constructor they are **NOT** guaranteed to be tagged.
See #15155, #14677, #14626. This fact also seems to block #14373.
They can instead contain indirections (to evaluated values) or absentError expressions generated by WW. This happens very rarely, but once is enough for a program to crash so this makes them useless for the purpose of inferring taggedness of references.
### The strict field invariant
Instead of accepting this fact we add a new invariant to GHC generated code: References to values stored in strict fields must be tagged.
The principle is again rather simple. If we would allocate a constructor with potentially untagged references in strict fields, we enter the referenced closure and store instead the result we get back.
This is safe since again, the references ultimately already lead to an evaluated result. It's only required to get rid of the indirection and/or to tag the reference.
Further if we assume a strict field is allocated once and read once we do no additional work. While we will enter it before allocation we won't need to enter it when reading the strict field.
If we already have a tagged reference when allocating the constructor, or when it's allocated once but accessed often we remove redundant work.
#### Potential for regressions.
In order for this to be worse than the status quo it's required that:
* We allocate a constructor with strict fields.
* The constructors strict fields are never accessed.
* We can't prove that the references stored in the strict fields are tagged.
A special case of the above are top level constructors currently containing untagged references in strict fields.
These need to be turned into thunks in order to allow tagging of the references before we return the constructor.
In practice this quite rare and does not seem to meaningfully impact performance.
### The tricky parts
The tricky part here is that the analysis has to be good enough to ensure the potential regression case is rare.
However I have a working prototype which successfully does so, so this does not seem to be an issue.
## Impact
### Performance
The biggest impact this has is when traversing data structures with strict spines.
Currently these tend to check for a lot of (always present) tags.
These checks can be completely for these traversals with this approach.
Things like membership tests for Data.Set are speed up by >10% in their benchmark suite in preliminary benchmarks.
There are also benefits in certain cases where a function will return tagged results "hidden" behind fields. eg given the following function:
```haskell
func :: Int -> Either (Int,Double) a
func x =
let !foo = f x
!bar = g x
in Left (foo, bar)
```
A consumer of the result accessing the Double value will not need to check for a tag on the `Tuple` or `Double` value!
### Other tickets
This will unblock #14373 which seems to depend on strict fields containing tagged references.
This will eliminate the possibility of untagged pointers being stored in strict fields: #15155.
This will eliminate many cases where we currently generate enter code for scrutinised values: #14626
This will NOT fix #14677 , it will only work around it by entering untagged closures to make sure we only use
the tagged result. But fixing #14677 will benefit this a work a lot so I will likely fix this as well in the course of implementing this.
### Numbers
Even though this is mostly targeted at code making heavy use of strict fields, even nofib seems to benefit decently:
<details>
<summary>Click to expand</summary>
```
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS +0.2% 0.0% -0.0% -0.0% 0.0%
CSD +0.2% 0.0% -0.0% -0.0% -0.0%
FS +0.2% 0.0% -0.0% -0.0% -0.0%
S +0.2% 0.0% +0.0% +0.0% +0.0%
VS +0.2% 0.0% -0.0% -0.0% -0.0%
VSD +0.2% 0.0% -0.0% -0.0% -0.0%
VSM +0.2% 0.0% -0.0% -0.0% -0.0%
anna +0.2% 0.0% -0.0% -0.0% -0.0%
ansi +0.2% 0.0% -0.0% -0.0% -0.0%
atom +0.2% 0.0% -0.0% -0.0% -0.1%
awards +0.2% 0.0% -0.0% -0.0% -0.0%
banner +0.2% 0.0% -0.0% -0.0% -0.0%
bernouilli +0.2% 0.0% -2.0% +0.0% -3.2%
binary-trees +0.2% 0.0% -0.0% -0.0% -0.0%
boyer +0.2% 0.0% -0.4% -0.1% -0.7%
boyer2 +0.2% 0.0% +0.1% +0.3% -0.3%
bspt +0.2% 0.0% -0.0% -0.0% -0.0%
cacheprof +0.2% 0.0% -0.0% +0.0% -0.1%
calendar +0.2% 0.0% -0.0% -0.0% -0.0%
cichelli +0.2% 0.0% -0.0% -0.0% -0.0%
circsim +0.2% 0.0% -0.0% -0.0% -0.0%
clausify +0.2% 0.0% -0.0% -0.0% -0.0%
comp_lab_zift +0.2% 0.0% -0.4% -0.3% -0.8%
compress +0.2% 0.0% -0.0% -0.0% -0.0%
compress2 +0.2% 0.0% -0.0% -0.0% -0.0%
constraints +0.2% 0.0% -0.0% -0.0% -0.0%
cryptarithm1 +0.2% 0.0% -0.0% -0.0% -0.0%
cryptarithm2 +0.2% 0.0% -0.0% -0.0% -0.0%
cse +0.2% 0.0% -0.1% -0.1% -0.1%
digits-of-e1 +0.2% 0.0% -3.6% +0.4% -5.3%
digits-of-e2 +0.2% 0.0% -2.7% +0.1% -3.4%
dom-lt -0.1% 0.0% +0.0% +0.3% -0.6%
eliza +0.2% 0.0% -0.0% -0.0% -0.0%
event +0.2% 0.0% -0.0% -0.0% -0.0%
exact-reals +0.2% 0.0% -3.0% +0.1% -4.7%
exp3_8 +0.2% 0.0% -0.0% -0.0% -0.0%
expert +0.2% 0.0% -0.0% -0.0% -0.0%
fannkuch-redux +0.2% 0.0% -0.0% -0.0% -0.0%
fasta +0.2% 0.0% -0.3% +0.3% +0.4%
fem +0.0% 0.0% -0.3% -0.2% -0.5%
fft +0.3% 0.0% -2.9% -2.0% -6.0%
fft2 +0.3% 0.0% -2.0% -1.5% -5.1%
fibheaps +0.2% 0.0% -0.0% -0.0% -0.0%
fish +0.2% 0.0% -0.0% -0.0% -0.0%
fluid +0.2% 0.0% -2.1% -1.4% -4.3%
fulsom +0.2% 0.0% -1.4% -0.6% -2.4%
gamteb +0.2% 0.0% -2.7% -0.3% -3.2%
gcd +0.2% 0.0% -4.0% +0.6% -5.1%
gen_regexps +0.2% 0.0% -0.0% -0.0% 0.0%
genfft +0.2% 0.0% -0.0% -0.0% -0.0%
gg +0.2% 0.0% -0.5% -0.1% -0.6%
grep +0.2% 0.0% -0.0% -0.0% -0.0%
hidden +0.2% 0.0% -0.9% -0.5% -1.5%
hpg +0.2% 0.0% -0.3% -0.1% -0.4%
ida +0.2% 0.0% -0.1% -0.1% -0.1%
infer +0.2% 0.0% -0.0% -0.0% +0.0%
integer +0.2% 0.0% -0.8% -0.0% -0.8%
integrate +0.2% 0.0% -0.0% -0.0% -0.0%
k-nucleotide +0.2% 0.0% -0.0% -0.0% -0.0%
kahan +0.2% 0.0% -0.0% -0.0% -0.0%
knights +0.2% 0.0% -0.0% -0.0% -0.0%
lambda +0.2% 0.0% -0.0% -0.0% -0.0%
last-piece +0.1% 0.0% -4.0% -0.2% -4.7%
lcss +0.2% 0.0% -0.1% -0.1% -0.2%
life +0.2% 0.0% -0.0% -0.0% -0.0%
lift +0.2% 0.0% -0.0% -0.0% -0.0%
linear +0.2% 0.0% -0.2% -0.0% -0.2%
listcompr +0.2% 0.0% -0.0% -0.0% -0.0%
listcopy +0.2% 0.0% -0.0% -0.0% -0.0%
maillist +0.2% 0.0% -0.0% -0.0% -0.0%
mandel +0.3% 0.0% -2.4% -2.4% -5.4%
mandel2 +0.2% 0.0% -0.0% -0.0% -0.0%
mate +0.2% 0.0% -0.0% -0.0% -0.0%
minimax +0.2% 0.0% -0.0% -0.0% -0.0%
mkhprog +0.2% 0.0% -0.0% -0.0% -0.0%
multiplier +0.2% 0.0% -0.0% -0.0% -0.0%
n-body +0.2% 0.0% -0.0% -0.0% -0.0%
nucleic2 +0.3% 0.0% +0.0% +0.0% +0.0%
para +0.2% 0.0% -0.0% -0.0% -0.0%
paraffins +0.2% 0.0% -0.9% -0.8% -1.2%
parser +0.2% 0.0% -0.0% +0.0% -0.0%
parstof +0.2% 0.0% -0.0% -0.0% -0.0%
pic +0.2% 0.0% -2.2% -1.2% -3.7%
pidigits +0.2% 0.0% -0.2% -0.0% -0.1%
power +0.2% +3.5% +0.7% +3.7% +1.9%
pretty +0.2% 0.0% -0.0% -0.0% -0.0%
primes +0.2% 0.0% -0.0% -0.0% -0.0%
primetest +0.2% 0.0% -1.0% -0.1% -1.7%
prolog +0.2% 0.0% -0.0% -0.0% -0.0%
puzzle +0.2% 0.0% +0.1% +1.2% -0.4%
queens +0.2% 0.0% -0.0% -0.0% -0.0%
reptile +0.2% 0.0% -0.1% -0.0% -0.1%
reverse-complem +0.2% 0.0% -0.0% -0.0% -0.0%
rewrite +0.2% 0.0% -0.0% -0.0% -0.0%
rfib +0.2% 0.0% -0.0% -0.0% -0.0%
rsa +0.2% 0.0% -1.1% -0.2% -1.9%
scc +0.2% 0.0% -0.0% -0.0% -0.0%
sched +0.2% 0.0% -0.0% -0.0% -0.0%
scs +0.1% +0.0% -0.7% +0.1% -0.9%
simple -0.1% 0.0% -2.3% -1.0% -4.2%
solid +0.2% 0.0% -1.8% -1.0% -3.2%
sorting +0.2% 0.0% -0.0% -0.0% -0.0%
spectral-norm +0.2% 0.0% -0.0% -0.0% -0.0%
sphere +0.2% 0.0% -0.2% -0.0% -0.4%
symalg +0.2% 0.0% -0.2% -0.1% -0.4%
tak +0.2% 0.0% -0.0% -0.0% -0.0%
transform +0.2% 0.0% +0.0% +0.0% -0.0%
treejoin +0.2% 0.0% -0.1% -0.1% -0.2%
typecheck +0.2% 0.0% -0.0% -0.0% -0.0%
veritas +0.2% 0.0% -0.0% -0.0% -0.1%
wang +0.2% 0.0% -0.0% -0.0% -0.0%
wave4main +0.2% 0.0% -12.2% -10.4% -22.9%
wheel-sieve1 +0.2% 0.0% -0.0% -0.0% -0.0%
wheel-sieve2 +0.2% 0.0% -0.0% -0.0% -0.0%
x2n1 +0.2% 0.0% -0.3% -0.0% -3.9%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -12.2% -10.4% -22.9%
Max +0.3% +3.5% +0.7% +3.7% +1.9%
Geometric Mean +0.2% +0.0% -0.5% -0.2% -0.9%
```
</details>
For containers last I checked the overall runtime reduction for the benchmarks was in the order of 4%, but I haven't run them on the current iteration yet.
See also: https://docs.google.com/document/d/19r5Jzk-qfLfUXtQKd0_ZuIs5LoOk0Jg0TUeOa5OvW2c/edit#https://gitlab.haskell.org/ghc/ghc/-/issues/16991Use unused high bits for additional pointer tags on x642020-08-10T12:08:22ZAndreas KlebingerUse unused high bits for additional pointer tags on x64This idea has came up at last ICFP, again at ZuriHac and again recently in a discussion.
So I think it warrants writing it down so that future discussion can be directed here.
## Motivation
We currently use the low bits for encoding ...This idea has came up at last ICFP, again at ZuriHac and again recently in a discussion.
So I think it warrants writing it down so that future discussion can be directed here.
## Motivation
We currently use the low bits for encoding constructor tags/arity as all objects are aligned to words. Which is very beneficial.
We can use the same thing for the high bits, as we never need the whole 64 bits for actual memory access.
If we can for example ensure that all objects are allocated below the 2 TB mark we would gain **23 additional tag bits**. Although we might not get (or need) as many.
## Benefits
The obvious thing to use this for is expanding the tag range, but we could also encode a host of other things which currently require us to read the info table like:
* Encode the closures type - Potentially avoiding the need to read the info table at all.
* Heap vs static object information - This check is fairly efficient already, but would get slightly faster still.
* Ojects we can simply copy and their size. - Allowing us to copy things like Int without touching the info table.
* Evacuatedness (currently requires reading the block descriptor)
* Or maybe other things.
As far as I know the JVM does similar things, so this ought to be possible.
However I do not plan to implement this any time soon.https://gitlab.haskell.org/ghc/ghc/-/issues/17004Export LambdaFormInfo for externally visible bindings in interface files.2023-07-18T15:14:51ZAndreas KlebingerExport LambdaFormInfo for externally visible bindings in interface files.## Motivation
Currently if one module defines a value like this:
`foo = Just bar`
Then any reference to foo in other modules will be be a direct *untagged* pointer to foo.
As a result every time we are scrutinizing such a reference ...## Motivation
Currently if one module defines a value like this:
`foo = Just bar`
Then any reference to foo in other modules will be be a direct *untagged* pointer to foo.
As a result every time we are scrutinizing such a reference we need to enter it's closure.
Quite a waste.
## Proposal
Export LambdaFormInfo to interface files. Allowing us to tag imported ids in the same way we tag ids coming from the current module.
### Open question:
* [x] Does this establish the invariant that pointers to constructors are always tagged. This question was raised [here](https://gitlab.haskell.org/ghc/ghc/merge_requests/1692#note_245401)
Answer: It doesn't as this is only enabled at -O1 and higher.Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/17079Optimize dataToTag# for small constructor families.2023-11-02T14:39:11ZAndreas KlebingerOptimize dataToTag# for small constructor families.## Motivation
Currently we might have some code `\x -> dataToTag# x :: T -> Int#`.
This evaluates the argument, and then executes a primop to get the tag of the argument.
This works and produces the Cmm code:
```haskell
c1g4: ...## Motivation
Currently we might have some code `\x -> dataToTag# x :: T -> Int#`.
This evaluates the argument, and then executes a primop to get the tag of the argument.
This works and produces the Cmm code:
```haskell
c1g4: // global
call (I64[R1])(R1) returns to c1g3, args: 8, res: 8, upd: 8;
c1g3: // global
R1 = %MO_UU_Conv_W32_W64(I32[I64[R1 & (-8)] - 4]);
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
```
It follows the pointer to the closures, then follows the pointer to the info table and extracts the pointer.
However, for types with few data cons we don't need to, after evaluating the value we are guaranteed that the pointer will be tagged. This means we can construct the tag from the pointer alone.
## Proposal
GHC instead should check if the type is a type for which we can reconstruct the tag from the pointer. And do so if possible.
Possibly via a rewrite rule to a dataToTagSmall# primop or similar. This would save two memory accesses for dataToTag#9.2.1https://gitlab.haskell.org/ghc/ghc/-/issues/17701Refer to preallocated closures for top level INTLIKE constructors.2020-01-18T17:24:25ZAndreas KlebingerRefer to preallocated closures for top level INTLIKE constructors.## Motivation
GHC has static closures for Int's in the range `MIN_INTLIKE .. MAX_INTLIKE`
When binding a value of type `foo = 1 :: Int` inside a function we already refer to these static closures.
During code gen for constructor appli...## Motivation
GHC has static closures for Int's in the range `MIN_INTLIKE .. MAX_INTLIKE`
When binding a value of type `foo = 1 :: Int` inside a function we already refer to these static closures.
During code gen for constructor applications we simply check for the `Int` case, and then compute a `LambdaFormInfo` which tells use sites to refer to the precomputed Int closure. This avoids a static or a dynamic version of the `I#` application, instead reusing the precomputed one.
This does currently not work for top level closures. The reason is that we don't export LambdaFormInfo across modules.
A top level definition `foo = I# 1# :: Int` will be referenced by other modules by resolving the symbol foo_closure.
For this reason we are forced to emit a static Int closure with the label foo_closure which will be resolved by the linker.
A (worse) alternative would be to instead emit a indirection to the preallocated closures. But the indirection would be just as large as the Int closure itself so this would not be beneficial.
## Proposal
After the implementation of #17004 we will have the ability to export `LambdaFormInfo` which contains information about we reference a given `Id`. If we make the `LambdaFormInfo` an required export for each top level Id we could then omit generating static Int closures in the `MIN_INTLIKE .. MAX_INTLIKE` range, and instead export a reference to the preallocated ones.
As a side note, the same approach would work for solving #16831, removing which should allow us to remove all cases of STATIC_IND I can think of.
## Benefits
This would have two benefits:
* Reduced code size. If I remember correctly GHC has a few KB worth of static Int closures which could be avoided this way.
* We would have fewer memory locations representing the same value, resulting in better data locality.https://gitlab.haskell.org/ghc/ghc/-/issues/18879Cmm output should exploit Unf=OtherCon []2020-10-28T11:38:25ZAndrasKovacsCmm output should exploit Unf=OtherCon []It appears to me that generated Cmm does not take into account `Unf= OtherCon []`, which as far as I know indicates that something is known to be already forced. For example:
```haskell
data S a = S {unS :: !a}
data Tree = Node !Tree !...It appears to me that generated Cmm does not take into account `Unf= OtherCon []`, which as far as I know indicates that something is known to be already forced. For example:
```haskell
data S a = S {unS :: !a}
data Tree = Node !Tree !Tree | Leaf !Int
inc :: S Tree -> Tree
inc (S t) = case t of
Leaf n -> Leaf (n + 1)
Node l r -> Node (inc (S l)) (inc (S r))
```
With GHC 8.8.4 and 8.10.2, I get the following worker function:
```
$winc :: Tree -> Tree
$winc
= \ (ww_s1iY
:: Tree
Unf=OtherCon []) ->
case ww_s1iY of {
Node l_a137 r_a138 ->
case $winc l_a137 of dt_X13T { __DEFAULT ->
case $winc r_a138 of dt1_X13V { __DEFAULT ->
Node dt_X13T dt1_X13V
}
};
Leaf dt_d1hC -> Leaf (+# dt_d1hC 1#)
}
```
I expected that `$winc$` does not check for thunks, but it does:
```
[$winc_entry() // [R2]
...
c1kC:
I64[Sp - 8] = c1kt;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c1kt; else goto c1ku;
c1ku:
call (I64[R1])(R1) returns to c1kt, args: 8, res: 8, upd: 8;
c1kt:
```
It's quite common to use records with strict fields, and in all such cases we get worker functions with superfluous forcing like above.https://gitlab.haskell.org/ghc/ghc/-/issues/19444Make code generated by `dataToTag#` more compact (and efficient, hopefully)2021-02-26T14:27:03ZSebastian GrafMake code generated by `dataToTag#` more compact (and efficient, hopefully)I was looking into code generated by `dataToTag#`. I defined `blue = dataToTag#` and was suprised how unstraightforward the Cmm is that I got:
```
c19O: // global
if ((Sp + -8) < SpLim) (likely: False) goto c19P; else ...I was looking into code generated by `dataToTag#`. I defined `blue = dataToTag#` and was suprised how unstraightforward the Cmm is that I got:
```
c19O: // global
if ((Sp + -8) < SpLim) (likely: False) goto c19P; else goto c19Q;
c19P: // global
R2 = R2;
R1 = Lib.blue_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
c19Q: // global
_c19D::P64 = R2 & 7;
if (_c19D::P64 == 0) (likely: False) goto c19N; else goto c19M;
c19N: // global
I64[Sp - 8] = c19G;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c19G; else goto c19H;
c19H: // global
call (I64[R1])(R1) returns to c19G, args: 8, res: 8, upd: 8;
c19G: // global
_c19E::I64 = %MO_UU_Conv_W32_W64(I32[I64[R1 & (-8)] - 4]);
goto c19L;
c19M: // global
if (_c19D::P64 == 7) (likely: False) goto c19K; else goto c19J;
c19K: // global
Sp = Sp - 8;
_c19E::I64 = %MO_UU_Conv_W32_W64(I32[I64[R2 & (-8)] - 4]);
goto c19L;
c19J: // global
Sp = Sp - 8;
_c19E::I64 = _c19D::P64 - 1;
goto c19L;
c19L: // global
R1 = _c19E::I64;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
```
That's an awful lot of code for such an (allegedly) simple function. I get that it cares about large constructor families, but note this is the critical path of the efficient case, where the arg is tagged < 7 (down to generated ASM):
```
_blk_c19O:
leaq -8(%rbp),%rax
cmpq %r15,%rax
jb _blk_c19P
_blk_c19Q:
movq %r14,%rax
andl $7,%eax
testq %rax,%rax
je _blk_c19N
_blk_c19M:
cmpq $7,%rax
je _blk_c19K
_blk_c19J:
addq $-8,%rbp
decq %rax
_blk_c19L:
movq %rax,%rbx
addq $8,%rbp
jmp *(%rbp)
```
1. The stack check is unnecessary, I think. The hot path shouldn't need any stack space.
2. The growing and the shrinking of the stack are unnecessary and cancel each other.
3. There are two other conditional jumps. Hard to improve here.
4. There is some shuffling of local variables going on the register allocator doesn't eliminate completely. It appears it fails to propagate the register constraint `%rbx` on the value representing the tag.
(1) and (2) are easily solved by extracting the slow "have to do evaluation" path into a shared, separate definition. (3) could be improved by encoding the "tag too large" case in tag == 1, so we only get a single `cmpq $2,%rax; jb (or ja? AT&T is confusing) 1`. I imagine (4) is a side-effect of the register allocator getting thrown-off by the unlikely code paths. In summary, I'd really like to get the following code inlined unconditionally at call sites:
```
_blk_c19Q:
movq %r14,%rbx
andl $7,%ebx // btw.: Is relying on andl not clobbering the upper bits of %rbx a good idea?
cmpq $2,%rbx
jb shared_eval_case // or ja, not sure
subq $2, %rbx // 2, because of (3) above
jmp *(%rbp) // will not actually be present if inlined at call site, I suppose...
```
Sooo much better.https://gitlab.haskell.org/ghc/ghc/-/issues/19445(Derived) Ord instances generate terrible code2022-11-22T09:59:00ZSebastian Graf(Derived) Ord instances generate terrible codeConsider these attempts at coming up with more compact `compare` functions:
```hs
{-# OPTIONS_GHC -O2 -fforce-recomp #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash #-}
module Lib (foo, bar, b...Consider these attempts at coming up with more compact `compare` functions:
```hs
{-# OPTIONS_GHC -O2 -fforce-recomp #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash #-}
module Lib (foo, bar, baz) where
import GHC.Exts
import GHC.Classes
data U = U Int deriving Eq
instance Ord U where
-- so that this function isn't WW'd
compare ~(U a) ~(U b) = lazy $ compare a b
{-# NOINLINE compare #-}
data T
= T0 U
| T1 U
| T2 U
| T3 U
| T4 U
| T5 U
deriving (Eq, Ord)
foo :: T -> T -> Ordering
foo = compare @T
bar :: T -> T -> Ordering
bar !_ r | !_ <- dataToTag# r, id False = undefined
bar (T0 a1) (T0 a2) = compare a1 a2
bar (T1 a1) (T1 a2) = compare a1 a2
bar (T2 a1) (T2 a2) = compare a1 a2
bar (T3 a1) (T3 a2) = compare a1 a2
bar (T4 a1) (T4 a2) = compare a1 a2
bar (T5 a1) (T5 a2) = compare a1 a2
bar l r
| isTrue# (dataToTag# l <# dataToTag# r) = LT
| otherwise = GT
baz :: T -> T -> Ordering
baz l r = compareInt# (dataToTag# l) (dataToTag# r) <> go l r
where
go (T0 a1) (T0 a2) = compare a1 a2
go (T1 a1) (T1 a2) = compare a1 a2
go (T2 a1) (T2 a2) = compare a1 a2
go (T3 a1) (T3 a2) = compare a1 a2
go (T4 a1) (T4 a2) = compare a1 a2
go (T5 a1) (T5 a2) = compare a1 a2
```
`bar` speculates `dataToTag#` on the second argument, so that its result is shared among all case branches. `baz` goes a step further: It simply compares tags and only compares fields when the tags match.
Here's the generated Core for the `T3` case:
```
foo a b_aLh = case a of {
T3 a1_aLp ->
case dataToTag# @T b_aLh of b#_aLq { __DEFAULT ->
case <# b#_aLq 3# of {
__DEFAULT ->
case b_aLh of {
__DEFAULT -> GHC.Types.LT;
T3 b1_aLr -> Lib.foo_$ccompare a1_aLp b1_aLr
};
1# -> GHC.Types.GT
}
}
}
bar a b_aLh =
case dataToTag# @T b_aLh of ds1_dU3 { __DEFAULT ->
case a of {
T3 a1_aJb ->
case b_aLh of {
__DEFAULT ->
case <# 3# ds1_dU3 of {
__DEFAULT -> GHC.Types.GT;
1# -> GHC.Types.LT
};
T3 a2_aJc -> Lib.foo_$ccompare a1_aJb a2_aJc
};
}
}
baz l_aJj r_aJk =
case dataToTag# @T r_aJk of wild_X1 { __DEFAULT ->
case dataToTag# @T l_aJj of wild1_X2 { __DEFAULT ->
case <# wild1_X2 wild_X1 of {
1# -> GHC.Types.LT;
__DEFAULT ->
case ==# wild1_X2 wild_X1 of {
__DEFAULT -> GHC.Types.GT;
1# ->
case l_aJj of {
T3 a1_aJs ->
case r_aJk of {
__DEFAULT -> Lib.baz1; -- pattern match error thunk
T3 a2_aJt -> Lib.foo_$ccompare a1_aJs a2_aJt
};
}
}
}
```
That is I think about the Core I'd expect, with one exception: I'd hoped that GHC would detect that the pattern match error in `baz` is impossible, which it doesn't.
Although `baz` looks rather big, I expected most of it to lower to very simple instructions. Well, `nm --print-size --size-sort test.o` reveals:
```
0000000000001180 000000000000032f T Lib_bar_info
0000000000000b00 0000000000000373 T Lib_foo_info
00000000000014c8 000000000000038f T Lib_bazz_info
```
So `bar` is a bit smaller than `foo` (the stock derived function), while `baz` is larger. All of them are quite large: They range between 0x300 (768) and 0x400 (1024) bytes!
I had a cursory look at the Cmm, and I was lost almost instantly. That was due to #19444, I think.
It's surely possible to generate more compact code for derived `Ord` instances. It's so common, yet so much more inefficient to do the equivalent in C!https://gitlab.haskell.org/ghc/ghc/-/issues/19523Implement UnliftedDatatypes2021-06-01T10:50:53ZSebastian GrafImplement UnliftedDatatypes_**Note:** f8c6fce4a09762adea6009540e523c2b984b2978 and !5440 wrongly refer to this ticket instead of #19623. If you got here by following the reference in the commit message, see #19623 instead._
----
[GHC proposal #265](https://githu..._**Note:** f8c6fce4a09762adea6009540e523c2b984b2978 and !5440 wrongly refer to this ticket instead of #19623. If you got here by following the reference in the commit message, see #19623 instead._
----
[GHC proposal #265](https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0265-unlifted-datatypes.rst) defines the `UnliftedDatatypes` extension.
This ticket tracks the implementation of said extension.https://gitlab.haskell.org/ghc/ghc/-/issues/19595Remove aBSENT_(SUM_)ERROR_ID, come up with better boxed rubbish value instead2021-05-16T16:32:48ZSebastian GrafRemove aBSENT_(SUM_)ERROR_ID, come up with better boxed rubbish value insteadIn https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5145#note_341693, I got aware of `Note [aBSENT_ERROR_ID]`, esp. the intricacies regarding stable unfoldings which make it necessary to drop `seq`s on expressions that force absent er...In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5145#note_341693, I got aware of `Note [aBSENT_ERROR_ID]`, esp. the intricacies regarding stable unfoldings which make it necessary to drop `seq`s on expressions that force absent errors.
#19133 and the long discussion in !5145 made increasingly clear to me that it might be worthwhile to get rid of absent errors altogether and always use boxed rubbish values instead.
What are the drawbacks? As Note [Absent fillers] of !5145 explains, it's a huge decrease in compiler dev debugging experience should an absent filler be relevant in the program after all (like it was in #11126 or #14285). Today, we'd get a nice panic message immediately upon evaluation. If we pick a rubbish value (which amounts to pointing to the `()` constructor for boxed types at the moment), we might not even get a panic and keep on executing the program with malformed inputs, see Modes of failure in Note [Rubbish values] (in !5145).
So the feasibility of removing absent errors crucially depends on finding a better boxed rubbish value.
How about we simply have a pointer to an unallocated memory page and tag it properly (e.g. 1), so that a `seq` will never enter it? Heck, we could even literally use the pointer `1`, which is `(char*)NULL+1` for a very low-tech solution. Then we already would've gained much: Unlike the hacks described in `Note [aBSENT_ERROR_ID]`, rubbish values are actually evaluated and will be treated as such in Core. And even if we `seq` such a rubbish value (something that #16970/!1472 is interested in doing), there is no harm because the pointer is already tagged and the seq will finish immediately. We would panic as prompt as with the current absent error mechanism (which is the most important gain) and not propagate the error possibly infinitely, only the error message would suffer. But the error message of absent errors is almost worthless in more complex settings anyway, as #11126 showed. This was the panic message:
```
Main: Oops! Entered absent arg arr2 Array D DIM1 Double
```
I believe the ticket was open and unresolved for so long because it's impossible to act on this message, given there are about 20 libraries imported and what not. We only fixed it "by accident", by fixing https://gitlab.haskell.org/ghc/ghc/-/issues/18638. I don't think we would have ever made progress on #11126 without the smaller reproducer from #18638. In fact, we aren't sure whether #11126 is fixed to this day, simply because it's hard to reproduce.
Granted, the panic message is hugely improved these days, because it includes the file name of the absent error. A more high-tech solution would be to point to one memory page per occurrence of boxed rubbish value, then we can even install an exception handler that translates the address of the faulted memory page back to the occurrence of the rubbish value, should code access it.
Here are a list of tickets that I think would be affected beneficially by this change (feel free to add):
- #16970
- #15155
- #14285
What do you think? @simonpj @AndreasKhttps://gitlab.haskell.org/ghc/ghc/-/issues/19766Worker/wrapper still makes error thunks for strict constructor fields.2021-11-01T21:25:53ZAndreas KlebingerWorker/wrapper still makes error thunks for strict constructor fields.While working on tag inference I found that ghc still seems to put error thunks into strict fields at times.
In particular we have this source code:
```haskell
fail_tycon global_env ty_con =
let pprov = case lookupGRE_Name gl...While working on tag inference I found that ghc still seems to put error thunks into strict fields at times.
In particular we have this source code:
```haskell
fail_tycon global_env ty_con =
let pprov = case lookupGRE_Name global_env (tyConName ty_con) of
Just gre -> nest 2 (pprNameProvenance gre)
Nothing -> empty
in failWithTc (term_level_tycons ty_con $$ pprov)
```
Which after WW comes out to this relevant core:
```haskell
join {
-- Essentially takes a unboxed name as argument
$w$j_soEE [InlPrag=[2], Dmd=MCM(CM(CM(CM(CM(CM(CM(CM(L))))))))]
:: GHC.Types.Name.NameSort
-> NameSpace
-> ghc-prim:GHC.Prim.Int#
-> ghc-prim:GHC.Prim.Int#
-> ghc-prim:GHC.Prim.ByteArray#
-> GHC.Data.FastString.FastZString
-> ghc-prim:GHC.Prim.Int#
-> Name
-> GHC.Utils.Ppr.Doc
[LclId[JoinId(8)],
Arity=8,
Str=<L><SL><L><L><L><LP(L,L,L)><L><LP(L,LP(L,LP(L,L,L,LP(L,L,L))),L,A)>]
$w$j_soEE (w_soEo :: GHC.Types.Name.NameSort)
(ww_soEu [Dmd=SL]
:: NameSpace
Unf=OtherCon [])
(ww_soEx :: ghc-prim:GHC.Prim.Int#)
(ww_soEy :: ghc-prim:GHC.Prim.Int#)
(ww_soEz :: ghc-prim:GHC.Prim.ByteArray#)
(ww_soEA [Dmd=LP(L,L,L)] :: GHC.Data.FastString.FastZString)
(w_soEq :: ghc-prim:GHC.Prim.Int#)
(w_soEs [Dmd=LP(L,LP(L,LP(L,L,L,LP(L,L,L))),L,A)] :: Name)
= ... -- Some let's removed for clarity
let {
w_soEr [Dmd=A] :: SrcSpan
[LclId]
w_soEr
= ghc-prim:GHC.Prim.Panic.absentError
@SrcSpan
"Arg: w_soEr\n\
\Type: SrcSpan\n\
\Id: w_soEr\n\
\In output file `_build/stage1/compiler/build/GHC/Tc/Gen/Head.o'"# } in
let {
ds1_ajuu [Dmd=SP(SL,SP(L,L,L,LP(L,L,L))), OS=OneShot] :: OccName
[LclId]
ds1_ajuu = w_soEp } in
let {
ds2_ajuw [Dmd=A, OS=OneShot] :: SrcSpan
[LclId, Unf=OtherCon []]
ds2_ajuw = w_soEr } in
let {
wild_ajus [Dmd=LP(L,LP(L,LP(L,L,L,LP(L,L,L))),L,A), OS=OneShot]
:: Name
[LclId,
Unf=Unf{Src=InlineStable, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=False)
Tmpl= GHC.Types.Name.Name ds_ajut ds1_ajuu dt_ajuv ds2_ajuw}]
wild_ajus = w_soEs } in
case ds1_ajuu of
{ GHC.Types.Name.Occurrence.OccName ww_amZ3 [Dmd=SL]
ww1_amZ4 [Dmd=SP(L,L,L,LP(L,L,L))] ->
case ww1_amZ4 of
{ GHC.Data.FastString.FastString ww2_amZ7 ww3_amZ8 ww4_amZ9
ww5_amZa [Dmd=LP(L,L,L)] ->
let {
d1_snVy [Dmd=SC1(L)] :: SDocContext -> GHC.Utils.Ppr.Doc
d1_snVy
= ... -- not imporant
case GHC.Types.Name.Reader.$wlookupGRE_Name_OccName
global_env_adne
wild_ajus
ww_amZ3
ww2_amZ7
ww3_amZ8
ww4_amZ9
ww5_amZa
of {
-- Not important
}
}
} } in
...
```
The absentError we enter is the RHS of `w_soEr`, `w_soEr` also being bound to `ds2_ajuw`. They both have a demand of plain 'A' (Absent).
This is a problem for WW, which relies on them being absent *and* strict if they might end up in a strict field.
How I think they get their demand is that `w_soEr` get's it's demand from `ds2_ajuw`.
`ds2_ajuw` appears as second argument to `$wlookupGRE_Name_OccName` which in turn
has this demand: `Str=<1L><LP(L,LP(L,LP(L,L,L,LP(L,L,L))),L,A)><SL><L><L><L><LP(L,L,L)>`
Putting the demand `<LP(L,LP(L,LP(L,L,L,LP(L,L,L))),L,A)>` onto `wild_ajus` which then results in `ds2_ajuw` getting just `A`.
Now this poses a problem. What happens down the line is we rebox the `Name` argument by inlining `wild_ajus`
like this:
```
$wlookupGRE_Name_OccName global_env_adne
(GHC.Types.Name.Name ds_ajut ds1_ajuu dt_ajuv ds2_ajuw)
```
With `ds2_ajuw` being our error thunk being stored in a strict field now, but not having a strict demand.
I wonder if this could simply be solved if we give every argument T a demand of at
least C_10 for it's strict components.
In a simple example like this:
```haskell
data T = MkT Char !(Bool,Bool)
foo :: Int -> T -> _
foo n x = n `seq`
case x of
MkT c _ -> (c, undefined)
```
Instead of giving the current demand `Str=<S,1*H><S,1*U(U,A)>,` we would
then give the demand `Str=<S,1*H><S,1*U(U,B)>`.
Not sure if there would be problems with this approach. Currently I think we
already do so during WW only in an adhoc-fashion using `addDataConStrictness`.
As a result I think the current way of marking these bindings as strict falls short for this purpose.9.4.1https://gitlab.haskell.org/ghc/ghc/-/issues/21083Consider running tag inference before byte code generation.2022-10-18T08:41:53ZAndreas KlebingerConsider running tag inference before byte code generation.Currently it's not part of the "regular" stg2stg pipeline and also not run before we generate bytecode. But it's always run when generating Cmm.
I think the interpreter currently upholds the strict field variant (by always using constru...Currently it's not part of the "regular" stg2stg pipeline and also not run before we generate bytecode. But it's always run when generating Cmm.
I think the interpreter currently upholds the strict field variant (by always using constructor wrappers) so it doesn't need to run it. But it might be desireable to run it before bytecode gen just for consistency. It shouldn't make a big perf difference one way or another.
I might look into this in the future but it's not high priority at this point.
------------
We decided not to do this for the time being. But we need to assert the byte code generator doesn't see constructor workers/strict workers to avoid this biting us in the future.https://gitlab.haskell.org/ghc/ghc/-/issues/21193Should evaluating a function return a tagged pointer.2023-10-04T03:37:31ZAndreas KlebingerShould evaluating a function return a tagged pointer.I've recently came across code like this. Not that this is from an unoptimized build and I think it's a rare issue in optimized builds.
```
case GHC.Utils.Outputable.ftext of conrep32_sPya [Occ=Once1] {
```
However the `conre...I've recently came across code like this. Not that this is from an unoptimized build and I think it's a rare issue in optimized builds.
```
case GHC.Utils.Outputable.ftext of conrep32_sPya [Occ=Once1] {
```
However the `conrep32_sPya` turned out to have no tag bits set despite being evaluated.
This was from a debug build so we had no LFInfo about `ftext` and therefore evaluated it via an unknown call.
```
R1 = GHC.Utils.Outputable.ftext_closure;
P64[Sp] = _sPy9::P64;
Sp = Sp - 8;
unwind Sp = Just Sp + 272;
call stg_ap_0_fast(R1) returns to c1jJX, args: 8, res: 8, upd: 8;
```
However this doesn't actually tag the result. Because stg_ap_0_fast ends up calling the ENTER macro which is defined such:
```
#define ENTER_(ret,x) \
again: \
W_ info; \
LOAD_INFO(ret,x) \
/* See Note [Heap memory barriers] in SMP.h */ \
prim_read_barrier; \
switch [INVALID_OBJECT .. N_CLOSURE_TYPES] \
(TO_W_( %INFO_TYPE(%STD_INFO(info)) )) { \
case \
IND, \
IND_STATIC: \
{ \
x = StgInd_indirectee(x); \
goto again; \
} \
case \
FUN, \
FUN_1_0, \
FUN_0_1, \
FUN_2_0, \
FUN_1_1, \
FUN_0_2, \
FUN_STATIC, \
BCO, \
PAP: \
{ \
ret(x); \
} \
default: \
{ \
x = UNTAG_IF_PROF(x); \
jump %ENTRY_CODE(info) (x); \
} \
}
```
That is for functions/paps we simply return the argument as-is.
--------------------
The obvious downside is that we are less likely to be able to tell the arity of functions without inspecting their info table. However it makes the "entry" code of functions really trivial.
If we were to change this we would merely have to extract the arity from the info table in the various FUN_* branches and tag the pointer with it.
However in an optimized build almost all references to functions should come into existence already tagged so I don't think this is likely to significantly change runtime performance.
Maybe I will look at changing this at some point but given it's low impact I don't think it's a priority.https://gitlab.haskell.org/ghc/ghc/-/issues/21207Rethink dataToTag# in light of tag inference.2024-02-27T13:58:42ZAndreas KlebingerRethink dataToTag# in light of tag inference.Simon in https://gitlab.haskell.org/ghc/ghc/-/issues/17240#note_414000 suggested some principally good changes to how we deal with dataToTag# which I will reproduce in full below
--------
**Background**
* `dataToTag#` (a primop) curre...Simon in https://gitlab.haskell.org/ghc/ghc/-/issues/17240#note_414000 suggested some principally good changes to how we deal with dataToTag# which I will reproduce in full below
--------
**Background**
* `dataToTag#` (a primop) currently evaluates its argument, via some magic in the code generator -- see `Note [dataToTag# magic]` in ConstantFold.hs
* As you point out, this makes the evaluated-ness of the argument invisible to the subsequent code; an explicit `case` expression would be better.
* We moved from an explicit case expression to making the eval part of `dataToTag#` during the long saga of #15696.
* That saga has lots of resonances from the "every strict data constructor field has a properly tagged pointer" debate; the canonical reference is https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/haskell-execution/pointer-tagging, which (alas) has not been updated following your tag-inference pass landing -- could you fix that? esp the "strict fields containing..." section.) #15696 even has the same containers issue; [see here](https://gitlab.haskell.org/ghc/ghc/-/issues/15696#note_160886).
**Opportunity**
* Mark `dataToTag#` as having a `CbvMark` on its argument.
* Add a `case` expression in `GHC.Base.getTag` to do the evaluation.
* Now the evals will be visibile in the equality code; but the machinery you have recently added will ensure that every `dataToTag#` call really will get a properly tagged argument to look at.
* Remove the eval magic for `dataToTag#` in the code generator
Result: simpler compiler, better code for comparisons. By the latter I mean that we can return to [this idea](https://gitlab.haskell.org/ghc/ghc/-/issues/17240#note_392048) without it being defeated by your observation that "the eval dataToTag# does is kinda hidden in it's implementation".
Does that make sense? Another payoff for the work you did on `CbvMarks`.
-----------
We should do this! However in order to get there we need to iron out a few wrinkles:
* Currently we don't use getTag in the deriving code (despite what it's docs say). But that's easy to change.
* In `case a of a' { _ -> dataToTag# a' }` the simplifier would remove the "redundant" case on a.
* If we want to mark `dataToTag#` as call by value and don't have it evaluate its argument itself we have to either:
* Run tag inference before bytecode gen: https://gitlab.haskell.org/ghc/ghc/-/issues/21083
* Split dataToTag# into a worker (with the call-by-value argument) and a wrapper. Unoptimized code would call the wrapper, while optimized code would call the wrapper (and would run tagInference). The W/W approach seems very reasonable and would only result in minimal compile time overhead for code currently using dataToTag# since the wrapper would be inlined.
* When tag inference ends up evaling a variable to ensure it's tagged it doesn't try to replace further occurences of the evaluated variable with the new (tagged) binder. It's simple to add this logic and iirc there is already TODO about it. I didn't include it initially because tag inference runs always when compiling (even -O0) and I wanted to avoid a potential comnpile time hit. But it shouldn't be hard to make this gated by a flag/optimization level to avoid the cost at -O0.
Overall I think this would be a good change. Maybe something to look at for 9.6https://gitlab.haskell.org/ghc/ghc/-/issues/21497Clarify and document the CBV-convention2023-12-09T16:43:51ZSimon Peyton JonesClarify and document the CBV-conventionI'd like to clarify the role of
* `CbvMarks`
* `OtherCon []` unfoldings
* Strictness signatures
* Strict data constructors
for a strict function. Relevant tickets
* #21472, !8148
* #14626
* #14677
* #15155
* #21496
* #14626: this relat...I'd like to clarify the role of
* `CbvMarks`
* `OtherCon []` unfoldings
* Strictness signatures
* Strict data constructors
for a strict function. Relevant tickets
* #21472, !8148
* #14626
* #14677
* #15155
* #21496
* #14626: this relates to wether a top-level constructor is exported properly-tagged.
* #15696: how untagged pointers can end up in strict constructor fields
* #16559
* #14677
* #20749
* #22475
## The olden days
Until a few months ago, even if a function was strict, it could not
guarantee that the caller would pass it an evaluated argument, or
that (even if it did) that value would be properly tagged.
For example, `reverse` is strict in its argument, but we might have
a call
```
f A xs = []
f B xs = reverse xs
f C xs = reverse (take 2 xs)
```
Now `f` is not strict in `xs` (see the `A` case), and our long-standing convention is that
`f` can just pass `xs` on to `reverse` in the `B` case. But in the `C` case we will use
call-by-value to avoid building a thunk, thus
```
f C xs = case take 2 xs of xs' -> reverse xs'
```
*Bottom line:* `reverse` cannot guarantee to have an evaluated argument. It might
be passed a thunk.
## Passing properly tagged pointers
See
* `Note [Tag Inference]` in GHC.Stg.InferTags
* `Note [How untagged pointers can end up in strict fields]` in GHC.Stg.InferTags, and #15696
* `Note [Strict Worker Ids]` in GHC.Types.Id.Info
* `Note [Attaching CBV Marks to ids]` in GHC.Core.Tidy
In the work on tag inference (pointers above), we allow a function to
establish a new calling convention: **that it must be given a properly
tagged pointer (to a value) as its argument**. Let's call this the **CBV-convention
for a particular argument**. Notice that
* A function is marked strict in an argument if the function itself is sure to evaluate the argument: strictness says something about the *function's* semantics.
* A function that uses a CBV-convention for an argument requires that argument to be passed as a properly-tagged value: it is a requirement on the *caller*.
Strict data constructors (or more precisely the worker for a strict
data constructor) have, and must have, a CBV convention for their strict arguments.
It is possible for a strict data constructor not to have a properly-tagged
argument; see `Note [How untagged pointers can end up in strict fields]` in GHC.Stg.InferTags, and #15696. To establish the invariant that struid data constructors *always* have properly-tagged arguments, the code generator adds extra evals around a strict data constructor, when necessary. This is done in `GHC.Stg.InferTags.Rewrite.rewriteTopBinds`.
## `OtherCon []` unfoldings
What does a OtherCon[] unfolding mean? **It means at every use site of the binder, the binder will represent a value in WHNF.**
This means it's very tempting to give workers a OtherCon unfolding for lambda binders which represent strict fields.
Since it seems reasonable to assume the field has been evaluated before constructor allocation.
This would allow a `seq` on that argument to be dropped.
This is an attractive idea -- after all, the caller should be guaranteeing that it is called by value! But it proves to be unsustainable, as we will see.
However that `seq` might be the only thing that makes that function strict.
e.g.
```
f x = x `seq` Just x
```
If we decide to give `f` a CBV-convention, we might transform to
```
f x{-Unf=OtherCon []-} = x `seq` Just x
```
and then drop the `seq`, giving
```
f x{-Unf=OtherCon []-} = Just x
```
That is fine provide all callers *actually* use call-by-value. But #21496 highlights that this is a very problematic assumption
and #21460 established that this indeed can go wrong in practice!
So we decided to remove `OtherCon` from all lambda binders.
This means the `OtherCon` invariant than rather becomes:
**For a binder with an `OtherCon` unfolding at every use site the binder will represent a value in WHNF.**
**We may not give lambda binders a `OtherCon` unfolding because it's impractical to uphold this invariant across function calls.** See the proposals below for details.
## Why we need CbvMarks at all?
Why do we need `CbvMark`s on any functions at all? Here's an example:
```
data T = MkT ![Int] Bool
f (MkT xs b) = MkT xs <big-expression>
```
Just before demand analysis we have this:
```
f t = case t of MkT xs b -> $WMkT xs <big>
```
(Remember, we have not yet inlined data constructor wrappers.)
Then demand analysis does w/w:
```
$wf xs b = $WMkT xs <big>
f t = case t of MkT xs b -> $wf xs b
```
Now we inline that `$WMkT`:
```
$wf xs b = case xs of xs' -> MkT xs <big>
```
Now, alas, we have no way to eliminate that `case xs`, even though
that argument of `$wf` came straight out of the incoming `MkT`.
Sad. And this really happens in code that takes apart and re-assembles
data structures with strict fields.
So we want to make `$wf` into a CBV-convention function (like the worker
`MkT` itself), so that
* Inside `$wf` we know that we'll be passed a non-bottom, properly tagged value, so we can make the `case xs` a no-op at runtime.
* The callers of `$wf` are responsible for guaranteeing that claim.
Something similar happens in SpecConstr. Suppose we have
```
foo :: T -> blah
foo t = rhs
baz = ...foo (MkT p q)...
```
Now SpecConstr specialise `foo`, thus
```
$sfoo xs b = let t = MkT xs b in rhs
RULE forall xs b. foo (MkT xs b) = $sfoo xs b
```
Here again, the `xs` argument to `$sfoo` comes straight from a `MkT`,
and we don't want to do extra evals on it in the body of `$sfoo`.
(I can't quite cook up a specific example here, but I'd like to see one. `Note [SpecConstr and evaluated unfoldings]`
doesn't give an actual example.)
### CbvMarks and StrictWorkerIds
Currently the notion of a CBV-parameter is very squishy. `Note [Attaching CBV Marks to ids]` in GHC.Core.Tidy says:
* Before tidy, function arguments which have a call-by-value semantics are identified
by having an `OtherCon[]` unfolding.
* During tidy, we transform this information into CBV (call-by-value)
marks. The marks themselves then are put onto the function id itself.
So we have two quite different ways to signal CBV-parameters.
When the simplifier inlines a call to a function with a CBV-parameter
(either imported, in which case it'll be signalled in the CbvMarks, or
local, in which case it'll be signalled in the unfolding), it must be careful
not to lose the "eval". I'm far from certain that we are careful.
In some way this can lead to a crash: see #21472. Sebastian [gives a reproducer here](https://gitlab.haskell.org/ghc/ghc/-/issues/21472#note_427231).
Putting `OtherCon` unfoldings on a lambda binder is extremely flaky, *even if the call site
is passing something that comes directly from a strict constructor field*.
Suppose
```
data T = MkT ![Int] Bool
f :: T -> T
f (MkT x b) = MkT x True
```
So (assuming we do a w/w split) we will end up with
```
$wf x* = $WMkT x True
f t = case t of MkT x _ -> $wf x
```
where `x*` means "has an `OtherCon []` unfolding, and `$WMkT` is the wrapper for `MkT` that does the eval.
(I'm assuming here that strict constructors have wrappers, as today.)
Now we inline `$WMkT` and optimise, dropping the evals (because of that flaky `OtherCon []`:
```
$wf x* = MkT x True
```
Now consider this call to `f`:
```
f ($WMkT ex eb)
{inline $WMkT} --> case ex of x -> f (MkT x eb)
{inline f} --> case ex of x -> case MkT x eb of MkT x _ -> $wf x
{case of known} --> case ex of x -> $wf x
{eval-of-strict} --> $wf ex
{inline $wf} --> MkT ex True
```
We have lost the eval. It will get put back the the code generator, but all our faffing with `OtherCon []` bought us nothing.
-----------------------------------
# Proposals
So much for the backgound. Here is our proposed new design.
NB: parts of it are the same as the current design, but we re-state it here (signalling that it's the status quo) for
completeness.
## Strict data constructors: no change
We propose to make no change in the way that strict data constructors are handled. Consider this example:
```
data T = MkT ![Int] Bool
```
Currently we do this:
* Make a wrapper
```
$WMkT x y = case x of x' -> MkT x' y
```
* `MkT`'s worker (the data constructor itself) is *lazy*, not strict, in all its arguments. The evals are done by the wrapper; the worker is passive. This is important; if we don't do this, the simplifier will drop the `case x` in `$WMkT` because `x'` is used strictly by `MkT`.
* GHC.CoreToStg.Prep can use eta-expansion to ensure that all strict data constructor
applications are either saturated, or applied to no arguments at all (`maybeSaturate`).
(Side note: `Note [CorePrep Overview]` point (1) claims that Prep saturates all data
constructor and primop applications, but that isn't true.)
* GHC.CoreToStg.Prep also adds a binding
```
MkT = \x y. MkT x y
```
so that the "no arguments at all" case has something to call (see `mkDataConWorkers`).
* In the code generator (or, rather, late STG), for every call to the worker `MkT`, we wrap an eval
for the first argument *if we can't statically see that the first argument is evaluated
and properly tagged*. This is done by `GHC.Stg.InferTags.Rewrite.rewriteTopBinds`.
While there is no change here, it does mean that we have a somewhat-subtle
Core invariant that we should document (#20749):
* In every Core program, it should be the case that replacing every constructor
application `(MkT xs b)` with `case xs of xs' -> MkT xs' b` does not change
the meaning of the program; that is, `xs` is non-bottom. (In general, add
a `case` for each strict field of the constructor.)
We establish the invariant by only introducing `MkT` via a call to its
wrapper `$WMkT`, which evaluates the argument. That ensures that the
strict field is non-bottom; and all further optimisation should
maintain that semantics.
Although the invariant is easy to establish, it is (sadly) not possible for Lint
to check it. In principle, a buggy Core-to-Core pass could break the invariant,
by introducing a call to the worker `MkT` that does not satisfy the invariant.
That is a shortcoming of the plan described here.
An attractive-seeming alternative would be to
make the constructor worker strict in its strict argument(s).
It's attractive because it might seem that the evals in the wrapper are mutually redundant
with the evals added by the code generator. But this approach had serious
problems; see ["Failed idea: no wrappers for strict data constructors"](#failed-idea-no-wrappers-for-strict-data-constructors) below.
## Proposal 0: `OtherCon []` unfoldings
Proposal: **Never put an `OtherCon []` unfolding on a binder.** Indeed **only let-binders have an Unfolding; no other binders have an unfolding at all**. Rather `OtherCon []` is added by the simplifier to *occurrences*, when in a
scope where that occurrence is konwn to be evaluated. Example:
```
data TD = A | B | C
case x of y
{ A -> ...(case y of { A -> e1; _ -> e2 })...
; _ -> ...(y `seq` blah)...
```
* In the `A` branch, `y`'s occurrences have an unfolding of `A`; that allows us to simplify the inner `case y`.
* In the `_` branch, `y`'s occurrences have an unfoldin of `OtherCon [A]` saying that `y` is not `A`; that allows us to drop the `seq`.
We use this for strict data constructors too:
```
data T = MkT ![Int] | A | B | C
foo x = case x of y
MkT p -> rhs1
_ -> rhs2
```
Here,
* In the `_` branch, `y` gets an `OtherCon [MkT]` unfolding.
* In the `MkT` branch, `y` gets an unfolding of `MkT p`; and `p` gets an `OtherCon []` unfolding.
But no *binder* has an `OtherCon []` unfolding; they are purely for occurrences.
## Proposal 1: CBV arguments
* **Completely get rid of `StrictWorkerId` and `CbvMarks` in the simplifier.**
* In the code generator, establish the following calling convention: **for "CBV-Ids", every strict, lifted argument is passed as an evaluated, properly-tagged pointer**.
See below for what a "CBV-Id" is.
* Whatever we do, we must ensure that the decisions we make about top-level functions are exposed in the interface file, **regardless of optimisation level**. If a function uses CBV, the caller **must** honour that calling convention. It's not optional! See #21676
* `GHC.Stg.InferTags.Rewrite.rewriteTopBinds` ensures that all calls meet this calling convention. Suppose `f` is a CBV-Id, with arity 2 and CbvMarks `[MarkedCbv, NotMarkedCbv]`. Then the rewriter
* Adds an eval to a saturated call `(f x y)`, thus `(case x of x2 -> f x2 y)` unless the rewriter can prove that `x` is properly tagged.
* Eta-expand unsaturated calls to `f`; thus `f` becomes `(\xy. case x of x2 -> f x2 y)`.
* Calls of non-CBV-Ids are unaffected, as are CBV-Ids that have no strict boxed arguments.
* When generating code for
* `case x of { p1 -> e1 ... }`: no need to eval if `x` is known to be evaluated and properly tagged
* `case x of y -> rhs`: do nothing at all if `x` is known to be evaluated and properly tagged
* I am no longer sure of the role of the tag inference pass. (To discuss.)
* **What Ids should be "CBV-Ids"?**. NB: this is a free choice. Everything will work correctly if we say "all" or "none" or anything in between.
One possiblility is "all Ids are CBV-Ids". That is simple. But Andreas (in the dim and distant past) tried (something like) this and found that he got a lot of code bloat from eta-expansion of CBV-Ids.
So instead we propose these Ids as CBV-Ids:
* Constructor workers: this one is actually vital to maintain the invariant that the argument of a strict constructor worker is properly tagged.
* Join points; these are always saturated, so no eta-expansion worries
* Worker-like Ids: workers from worker/wrappr, specialised functions from SpecConstr. Again always saturated --- at least when they are born.
And that's about it. It's simple, catches all the cases that we are currently worried about.
The second bullet of "when generating code" effectively drops redundant cases altogether; but we only
do it at this late stage, not driven by flaky `OtherCon []` unfoldings.
## Proposal 2: absent fillers for strict fields
Now that we have removed `OtherCon` from lambda binders (see "Proposal: CBV arguments" above), we risk producing code which fails with `absentError` panics at times.
This is because W/W currently uses `OtherCon` as a way to determine that a binder represents a strict field. See
https://gitlab.haskell.org/ghc/ghc/-/commit/eee498bfce3cce03ba1017b65d559c79d5c2eb60#f4421b4e35592648510c877ecf55b1af2b96dcee_1036_991 and `Note [Absent fillers]` 2):
Obviously this approach is no longer possible after `OtherCon` is no longer present on these binders! Here is the proposal:
**In worker/wrapper, use non-bottom absent fillers for absent, banged fields**.
Despite the current note in W/W saying this is really nasty it turned out to be not hard at all.
Taking the code from `Note [Absent fillers]` this means for this code:
```
data T = MkT [Int] [Int] ![Int] -- NB: last field is strict
f :: T -> Int# -> blah
f ps w = case ps of MkT xs ys zs -> <body mentioning xs>
```
Then f gets a strictness sig of <S(L,A,A)><A>. But we thread through the bangs of the fields so we make a worker `$wf`
```
-- RHS size: {terms: 25, types: 21, coercions: 0, joins: 0/2}
$wf_s14m [InlPrag=NOINLINE] :: [Int] -> Int#
[LclId[StrictWorker([])], Arity=1, Str=<SL>]
$wf_s14m
= \ (ww_s14f [Dmd=SL] :: [Int]) ->
case let {
ps_awV [Dmd=S!P(SL,A,A)] :: T
[LclId]
ps_awV
= M.MkT ww_s14f
(GHC.Prim.Panic.absentError @[Int] ...)
(RUBBISH(LiftedRep) @[Int])
} in
let {
w_awW [Occ=Dead, Dmd=A] :: Int#
[LclId]
w_awW = RUBBISH('IntRep) @Int# } in
case ps_awV of
{ MkT xs_awX [Dmd=SL] ys_awY [Dmd=A] zs_awZ [Dmd=A] ->
case GHC.List.$wlenAcc @Int xs_awX 0# of ww1_a14b { __DEFAULT ->
GHC.Types.I# ww1_a14b
}
}
of ww_s14k
{ __DEFAULT ->
case ww_s14k of wild_00 { I# ww_s14n -> ww_s14n }
}
```
The trick here is that when we make the filler for `zs` we check if there was a bang on this field
and if so we create `RUBBISH(LiftedRep) @[Int]` instead of an absentError like for the lazy field `ys`
This means we can still drop the original `zs` early potentially freeing memory retained by it and we don't have to pass it as an argument either.
The downside of this approach is if absence analysis is actually wrong then `zs` won't fall over with a panic and instead we get segfaults/incorrect results.
I will implement this as part of https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8148
## Proposal 3a: add `seq`s in the worker
Consider
```
f :: T -> Maybe T
f t = case t of MkT xs b -> Just (MkT xs (g b))
```
Strictness analysis will worker/wrapper to
```
$wf xs b = let t = MkT xs b in case t of MkT xs b -> Just (MkT xs (g b))
f t = case t of MkT xs b -> $wf xs b
```
We optimise `$wf` to
```
$wf xs b = Just (MkT xs <blah>)
```
but alas now
* `$wf` is no longer strict in `xs` (remember `MkT` is not strict; and in any case the `Just` makes it lazy).
* The code generator will add an eval for `xs` around the call to `MkT`.
**Proposal**: during w/w, add an eval, in the worker, for used arguments of strict data constructors:
```
$wf xs b = case xs of xs' ->
let t = MkT xs' b in
case t of MkT xs b -> Just (MkT xs <blah>)
```
Now after optimising we get
```
$wf xs b = case xs of xs' -> Just (MkT xs' <blah>)
```
Now `$wf` is visibly strict in `xs`, so:
* the code generator will not add an eval around `MkT`, since there is an enclosing eval
* the code generator will make `$wf` into a CBV-Id; and hence will drop the `case xs` in `$wf` because the caller will guarantee it is evaluated. See "Proposal: CBV arguments".
## Proposal 3b: SpecConstr: `seq`s in the specialised function
We should add extra evals for strict arguments in SpecConstr, just as in worker/wrapper.
```
f t = ....(case t of MkT xs b -> blah)....
```
Suppose SpecConstr decides to make the following RULE
```
RULE forall xs b. f (MkT xs b) = $sf xs b
```
Then the specialised function should have the extra evals:
```
$sf xs b = case xs of xs' ->
let t = MkT xs b in
...(case t of MkT xs b -> blah)....
```
(This is sound becuase in the call `f (MkT xs b)` we know that `xs` is non-bottom.)
This `$sf` optimises to
```
$sf xs b = case xs of xs -> ...(blah)....
```
and again any uses of `xs` in `blah` can see that `xs` is evaluated. Again, `$sf` is strict
so (see "Proposal: CBV arguments") we'll use a CBV convention, and end up dropping the `case`.
## Proposal 3c: `seq`s in join points arising from case alternatives
Eaxctly the same deal for join points: add extra evals for strict arguments. Suppose we have
```
case e of MkT xs y -> rhs
```
and we want to make a join point:
```
join $j xs y = case xs of xs -> rhs in
case e of MkT xs y -> $j xs y
```
-------------------------
## Failed ideas
### Failed idea: no wrappers for strict data constructors
```
data T = MkT ![Int]
```
At one point I proposed, for
* Not have a wrapper `$WMkT` at all
* Mark the worker `MkT` as strict in its first argument. More precisely, in demand analysis change `dmdTransformDataConSig` to add a 1A demand on strict fields.
Having a wrapper to add evals seems redundant, becuause the code generator will add them anyway.
But it isn't! Suppose we do the case-of-known-constructor transformation:
```
case MkT x of MkT p -> True ==> True
```
If `MkT` is strict in `x`, then the LHS is strict in `x`; but the RHS is not.
We'd have to change the transformation to add evals. Similarly case-to-let
```
case MkT x of r -> Just r ==> let r = MkT x in Just r
```
Again, the LHS is strict in `x` but the RHS is not.
There is similar knock-on effect on `exprIsHNF`, `exprOkForSpeculation` etc.
In the end, the no-wrapper simplicity seems well outweighed by the extre complexity
in multiple other transformations. The fact that constructors are lazy
is rather deeply built into GHC!
### Alternative `absentError` Proposal: **Give constructor workers always at least a used demand on their strict fields**
Taking the code from `Note [Absent fillers]` this means for this code:
```
data T = MkT [Int] [Int] ![Int] -- NB: last field is strict
f :: T -> Int# -> blah
f ps w = case ps of MkT xs ys zs -> <body mentioning xs>
```
Then f gets a strictness sig of <S(L,A,L)><A>.
Since the strict field will no longer be absent we don't generate a filler at all and instead just use the actual field content. This means we will produce a worker like this:
```
-- RHS size: {terms: 25, types: 21, coercions: 0, joins: 0/2}
$wf_s14m [InlPrag=NOINLINE] :: [Int] -> [Int] -> Int#
[LclId[StrictWorker([])], Arity=2, Str=<SL><L>]
$wf_s14m
= \ (ww_s14f [Dmd=SL] :: [Int])
(ww_zs [Dmd=L] :: [Int]) ->
case let {
ps_awV [Dmd=S!P(SL,A,A)] :: T
[LclId]
ps_awV
= M.MkT ww_s14f
(GHC.Prim.Panic.absentError ...)
ww_zs
} in
let {
w_awW [Occ=Dead, Dmd=A] :: Int#
[LclId]
w_awW = RUBBISH('IntRep) @Int# } in
case ps_awV of
{ MkT xs_awX [Dmd=SL] ys_awY [Dmd=A] zs_awZ [Dmd=A] ->
case GHC.List.$wlenAcc @Int xs_awX 0# of ww1_a14b { __DEFAULT ->
GHC.Types.I# ww1_a14b
}
}
of ww_s14k
{ __DEFAULT ->
case ww_s14k of wild_00 { I# ww_s14n -> ww_s14n }
}
```
After some reflection I think this isn't a great idea because:
* We will retain zs longer.
* We pass more arguments to the worker for no good reason.
* It makes the strictness signature less precise.
The upside of this option is that if absence analysis is wrong we are more likely to actually get an absentError instead of a segfault.
If we *really* want to ensure we get an `absentError` in case absence error on a strict field goes wrong then we can implement the alternative `absentError` proposal in the future.9.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/21617Wrapping unlifted datatypes without pointer indirection?!2023-03-23T22:19:23ZSimon JakobiWrapping unlifted datatypes without pointer indirection?!Say you have an unlifted datatype `U`
```haskell
type U :: UnliftedType
data U = U1 | U2 <params> | ...
```
For better ergonomics, typeclass instances, etc., you also want a lifted wrapper `L`
```haskell
data L = L U
```
Unfortunatel...Say you have an unlifted datatype `U`
```haskell
type U :: UnliftedType
data U = U1 | U2 <params> | ...
```
For better ergonomics, typeclass instances, etc., you also want a lifted wrapper `L`
```haskell
data L = L U
```
Unfortunately it seems that this wrapper would incur an additional pointer indirection, somewhat reducing the benefits of using an unlifted type in the first place.
Is there a way that GHC could provide such lifted wrappers but _without_ incurring a pointer indirection?!
(Background: In `unordered-containers` we're considering using `UnliftedDatatypes` for `HashMap`: https://github.com/haskell-unordered-containers/unordered-containers/issues/463)Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/21729Fix sanity tag check logic in GHC.StgToCmm.TagCheck2022-07-12T13:08:40ZAndreas KlebingerFix sanity tag check logic in GHC.StgToCmm.TagCheckI used `isLiftedRuntimeRep` when I should have used `isBoxedType`.
So the check never actually checks anything.I used `isLiftedRuntimeRep` when I should have used `isBoxedType`.
So the check never actually checks anything.Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/21954Tag inference: Unconditionally retain tag info on variables.2022-09-16T17:57:18ZAndreas KlebingerTag inference: Unconditionally retain tag info on variables.For an expression like:
```
case x of y
Con z -> z
```
We currently only retain tag infor for `x` as it's known to be scrutinized. But we should *also* retain it for `z` since a rhs of just a variable implicitly represents eval+retur...For an expression like:
```
case x of y
Con z -> z
```
We currently only retain tag infor for `x` as it's known to be scrutinized. But we should *also* retain it for `z` since a rhs of just a variable implicitly represents eval+return. So retaining the tag info can sometimes avoid the eval step.9.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/22042ghc-9.4.1 runghc illegal hardware instruction2022-10-18T08:41:53ZShayne Fletchershayne@shaynefletcher.orgghc-9.4.1 runghc illegal hardware instruction## Summary
a program calling `setSGR` from the `System.Console.ANSI` package executed via `runghc` exhibits an illegal hardware instruction error.
see this [commercial-stack issue](https://github.com/commercialhaskell/stack/issues/5823...## Summary
a program calling `setSGR` from the `System.Console.ANSI` package executed via `runghc` exhibits an illegal hardware instruction error.
see this [commercial-stack issue](https://github.com/commercialhaskell/stack/issues/5823) for further insights.
## Steps to reproduce
execute
```bash
cat > Main.hs <<EOF
import System.Console.ANSI
main = do
setSGR [SetColor Foreground Dull Red]
EOF
~/ghc-9.4.1/bin/runghc -hide-all-packages -package --ghc-arg="base-4.17.0.0" -package --ghc-arg="colour-2.3.6" -package --ghc-arg="ansi-terminal-0.11.3" --ghc-arg=-package-db --ghc-arg=$HOME/.cabal/store/ghc-9.4.1/package.db Main.hs
```
and observe abnormal termination and the bash exit code `132`. (the equivalent 9.2.4 command
```
~/ghc-9.2.4/bin/runghc -hide-all-packages -package --ghc-arg="base-4.16.3.0" -package --ghc-arg="colour-2.3.6" -package --ghc-arg="ansi-terminal-0.11.3" --ghc-arg=-package-db --ghc-arg=$HOME/.cabal/store/ghc-9.2.4/package.db Main.hs
```
terminates normally. also, a "raw" ghc command
```
~/ghc-9.4.1/bin/ghc -o test -hide-all-packages -package "base-4.17.0.0" -package "colour-2.3.6" -package "ansi-terminal-0.11.3" -package-db $HOME/.cabal/store/ghc-9.4.1/package.db Main.hs&&./test
```
terminates normally too.)
## Expected behavior
terminate normally with no output and bash exit code `0`
## Environment
* GHC version used: ghc-9.4.1
Optional:
* Operating System: macOS monterey
* System Architecture: x86_649.4.2Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/22066-dtag-inference-checks flag is broken on the unregisterized backend.2023-02-07T16:16:40ZAndreas Klebinger-dtag-inference-checks flag is broken on the unregisterized backend.See https://gitlab.haskell.org/ghc/ghc/-/jobs/1145537
The issue seems to be that the RTS function barf isn't in scope and we end up with:
```
/tmp/ghc317867_0/ghc_1.hc:1519:32: error:
error: ‘barf’ undeclared (first use in this fu...See https://gitlab.haskell.org/ghc/ghc/-/jobs/1145537
The issue seems to be that the RTS function barf isn't in scope and we end up with:
```
/tmp/ghc317867_0/ghc_1.hc:1519:32: error:
error: ‘barf’ undeclared (first use in this function); did you mean ‘nanf’?
ghcFunPtr = ((void (*)(void *))barf);
^~~~
nanf
```
Not sure if we should just emit a dummy prototype or add the appropriate include to the C code. Probably just do whatever we do for `stg_gc_fun` and friends. But it's fairly low priority so for now I just open a ticket instead.https://gitlab.haskell.org/ghc/ghc/-/issues/22192Consider moving TagSigs from IdInfo to LFInfo.2022-09-20T14:45:10ZAndreas KlebingerConsider moving TagSigs from IdInfo to LFInfo.@simonpj would prefer TagSigs to be stored inside LFInfo rather than IdInfo and argues this will make things simpler.
I'm not yet fully convinced but might look into that in the future. This ticket is here both so that I won't forgot an...@simonpj would prefer TagSigs to be stored inside LFInfo rather than IdInfo and argues this will make things simpler.
I'm not yet fully convinced but might look into that in the future. This ticket is here both so that I won't forgot and
in case someone else wants to look into this.https://gitlab.haskell.org/ghc/ghc/-/issues/22840Tag inference failure with -fprefer-byte-code2023-08-04T00:08:49ZKrzysztof GogolewskiTag inference failure with -fprefer-byte-codeCompiling the following three modules fails the tag inference check (and causes a segfault in a larger application).
It seems to be an interaction between fat interface files and tag inference. Reverting d6ea8356b7 fixes the bug.
A.hs
...Compiling the following three modules fails the tag inference check (and causes a segfault in a larger application).
It seems to be an interaction between fat interface files and tag inference. Reverting d6ea8356b7 fixes the bug.
A.hs
```haskell
module A where
data A = MkA { getA :: !(Maybe Bool) }
```
B.hs
```haskell
{-# OPTIONS_GHC -fwrite-if-simplified-core #-}
module B where
import A
theA :: A
theA = MkA (Just True)
```
C.hs
```haskell
{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -fwrite-if-simplified-core -fbyte-code-and-object-code -fprefer-byte-code #-}
module C where
import A
import B
$(case getA theA of { MkB x -> pure [] })
```
ghci.script
```
:load A.hs B.hs
import A
import B
let !x = getA theA
```
To reproduce either build C.hs
```
rm *.hi *.o
_build/stage1/bin/ghc -dtag-inference-checks -fforce-recomp C
[1 of 3] Compiling A ( A.hs, A.o )
[2 of 3] Compiling B ( B.hs, B.o, interpreted )
[3 of 3] Compiling C ( C.hs, C.o, interpreted )
ghc: internal error: Tag inference failed on:TagCheck failed on entry in A - value:ds1_sH8 _sH8::P64
```
Or execute ghci.script in ghci *after having built B.hs first*:
```
andi@horzube:~/ghc_infer_bytecode/tmp$ ../_debug/stage1/bin/ghc -dtag-inference-checks -fforce-recomp B -ddump-to-file -ddump-stg-final
[1 of 2] Compiling A ( A.hs, A.o )
[2 of 2] Compiling B ( B.hs, B.o )
andi@horzube:~/ghc_infer_bytecode/tmp$ ../_debug/stage1/bin/ghc -dtag-inference-checks --interactive < ghci.script
GHCi, version 9.7.20230130: https://www.haskell.org/ghc/ :? for help
ghci> Ok, two modules loaded.
ghci> ghci> ghci> ghci> ghci> <interactive>: internal error: Tag inference failed on:TagCheck failed on entry in A - value:ds1_sAP _sAP::P64
(GHC version 9.7.20230130 for x86_64_unknown_linux)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
Aborted (core dumped)
```9.6.1Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/22935Clarify semantics of evaluate/seq#2024-02-11T18:23:30ZMatthew Cravenclyring@gmail.comClarify semantics of evaluate/seq#Forked from a discussion at #15226.
There are two aspects of the behavior of `evaluate` and its underlying primop `seq#` that aren't so clear.
1. Should the strictness of `evaluate` when executed be visible to demand analysis?
2. Shoul...Forked from a discussion at #15226.
There are two aspects of the behavior of `evaluate` and its underlying primop `seq#` that aren't so clear.
1. Should the strictness of `evaluate` when executed be visible to demand analysis?
2. Should `evaluate` be considered to throw a precise exception?
Question 1 mainly affects programs like `f x = someSideEffect >> evaluate x`. If the strictness of `evaluate` is visible to demand analysis, then GHC may eagerly evaluate the argument of `f` at a call-site to avoid producing a thunk, meaning `x` gets evaluated before `someSideEffect` is actually performed. Since `evaluate` is the only tool we provide for sequencing evaluation relative to side-effects, I think this behavior would be very undesirable.
A user who wants strictness can ask for it as easily as `evaluate $! x`. But unfortunately we rewrite `case x of x' { _ -> seq# x' s }` to `case x of x' { _ -> (# s, x' #) }` so `evaluate $! x` is equivalent to `pure $! x` with current GHC, which is a nasty surprise. (This rewrite rule is also the reason we may sometimes evaluate `y` before `x` in `evaluate x >> evaluate y`, even though `evaluate` is currently lazy.)
Question 2 mainly affects programs like `a = evaluate err1 >> err2`. If `evaluate` is considered to throw a precise exception, then execution of `a` is lazy in `err2` and can only throw `err1`, while if `evaluate` is not considered to throw a precise exception then execution of `a` is strict in `err2` and may throw either `err1` or `err2`. I don't feel strongly about this question.
cc @sgraf812 @treeowl @simonpjhttps://gitlab.haskell.org/ghc/ghc/-/issues/23173Make the enter code for small data types panic2023-03-30T08:45:19ZSebastian GrafMake the enter code for small data types panicThink of this as a counter proposal to https://gitlab.haskell.org/ghc/ghc/-/issues/21792, the other end of a spectrum of which we currently reside in the middle.
The suggestion of @clyring in https://github.com/ghc-proposals/ghc-proposa...Think of this as a counter proposal to https://gitlab.haskell.org/ghc/ghc/-/issues/21792, the other end of a spectrum of which we currently reside in the middle.
The suggestion of @clyring in https://github.com/ghc-proposals/ghc-proposals/pull/530 to retain the panicking enter code for `ByteArray#` and friends and instead give it proper, non-zero tags gave me pause. I tried hard to follow my intuition and make it break, but wasn't able to (we'll revisit my attempts below). Thinking about what is so special about `ByteArray#`, I concluded "nothing" and propose the following:
**Make the enter code of any taggable `BoxedRep` normal form crash.**
(Where "taggable" refers to the architecture specific distinction of "small constructor families", e.g., those that can be given a tag.)
If this is possible, then we'll have a straight win in terms of debugging; if not, we can adapt our reproducer to prove that under Proposal 530 `ByteArray#` could be legimately entered.
Here's my attempt to break the suggestion:
```hs
data B = T | F deriving Show
b :: B
b = <churn>
main = b `seq` print b
```
Assuming nothing is inlined, the `seq` will need to enter `b`, so there *are* closures of type `B` that need to be enterable. But it turns out that we only need *thunk* or *indirection* closures to be enterable, not the values `T` or `F`. When we enter the thunk for `churn`, we'll run its code and ultimately (call another thunk/indirection/function returning a properly tagged value, or) "see", in the code for `churn`, that the result is plain `T` or `F`, for which the code in `churn` can be generated in a way that doesn't need to enter `T` or `F` and make sure that the proper tags are set.
I'm a bit hazy about indirectee pointers, especially static ones and perhaps blackholes. Consider the example above; after `b` has been `seq`'d, it will be evaluated by `print` again. By then, `b` no longer points to the thunk object for `churn` but to an `IND`irection closure, the payload pointer (*indirectee* pointer) of which must absolutely be properly tagged. Is that always the case?https://gitlab.haskell.org/ghc/ghc/-/issues/23231Nullary Data Con Wrappers are not tagged2023-09-13T12:07:00ZRodrigo MesquitaNullary Data Con Wrappers are not taggedThis ticket is concerned with the tagging logic of nullary data con *wrappers*. Concretely, it seems we are not tagging nullary data con wrappers at their usage site. The immediate consequences of not tagging these wrappers must be consi...This ticket is concerned with the tagging logic of nullary data con *wrappers*. Concretely, it seems we are not tagging nullary data con wrappers at their usage site. The immediate consequences of not tagging these wrappers must be considered in at least two distinct scenarios:
The data type is lifted:
* If we pass the datacon wrapper as an argument to a function untagged (tag=0), despite knowing exactly what the constructor is, in the function body we will treat the argument as a thunk (which means calling at least an extra function -- the data con _con_entry() -- which will in turn return a tagged pointer)
The data type is unlifted:
* When we pass the unlifted datacon wrapper untagged (tag=0) to a function (that must be expecting an unlifted argument), we will get a segmentation fault as in #23146. The reason is functions expect their unlifted arguments to be evaluated and tagged, and therefore do not handle the case in which the unlifted argument has tag=0.
This ticket is closely related to #23146 and I intend to fix both simultaneously.
This one discusses the tagging issue in its full generality as opposed to the segfault particular to the unlifted case.
----
# Examples
I've got a good example of this for each case.
There are two subtleties about them. Nonetheless, they showcase the bug well and we must handle this correctly regardless:
(1) Why do we even have a wrapper for a nullary data con?
Answer: because of the unlifted equality coercion the worker takes as an argument (despite this argument not being runtime relevant)
(2) The wrapper *doesn't* inline. (Why not?).
### Unlifted case
This example will segfault when run, and it's the reproducer from #23146.
```haskell
-- A.hs
{-# LANGUAGE GADTs, UnliftedDatatypes #-}
-- Unlifted, Nil worker has 1 Zero-Width argument
import GHC.Exts
import Data.Kind
import R2
fieldsSam :: NP xs -> NP xs -> Bool
fieldsSam (x' ::* xs) (y' ::* ys) = fieldsSam xs ys
fieldsSam UNil UNil = True
main :: IO ()
main = print (fieldsSam UNil UNil)
-- B.hs
{-# LANGUAGE UnliftedDatatypes #-}
module R2 where
import GHC.Exts
type NP :: [UnliftedType] -> UnliftedType
data NP xs where
UNil :: NP '[]
(::*) :: x -> NP xs -> NP (x ': xs)
```
### Lifted case
This example will not segfault, but inspecting the cmm will show the wrappers being passed untagged and the logic to handle untagged arguments (tag == 0)
```haskell
-- A.hs
{-# LANGUAGE GADTs, UnliftedDatatypes #-}
-- Lifted, 1 Zero-Width argument
import GHC.Exts
import Data.Kind
import S2
fieldsSam :: NP xs -> NP xs -> Bool
fieldsSam (x' ::* xs) (y' ::* ys) = fieldsSam xs ys
fieldsSam UNil UNil = True
main :: IO ()
main = print (fieldsSam UNil UNil)
-- B.hs
{-# LANGUAGE UnliftedDatatypes #-}
module S2 where
import Data.Kind
import GHC.Exts
type NP :: [Type] -> Type
data NP xs where
UNil :: NP '[]
(::*) :: x -> NP xs -> NP (x ': xs)
```
CC @simonpj @bgamari @AndreasKRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/23404GHCi segfault with GHC 9.4.52023-12-19T11:23:13ZTeo CamarasuGHCi segfault with GHC 9.4.5## Summary
I've been looking at a segfault that seems to be triggered by some interpreted code calling into object code.
This started appearing after upgrading from GHC 9.2.5 to 9.4.5.
I've managed to extract an independent reproducer t...## Summary
I've been looking at a segfault that seems to be triggered by some interpreted code calling into object code.
This started appearing after upgrading from GHC 9.2.5 to 9.4.5.
I've managed to extract an independent reproducer that I've linked below.
## Steps to reproduce
Checkout the reproducer from: https://gitlab.haskell.org/teo/ghci-segfault-9.4
Then
```
nix develop
cabal repl ghci-segfault-test
:main
```
## Expected behavior
Not seg fault
## Environment
* GHC version used: 9.4.59.4.6ZubinZubinhttps://gitlab.haskell.org/ghc/ghc/-/issues/23549Compiled program throws ‘Non-exhaustive patterns in case’2023-07-03T07:31:10ZRyan HendricksonCompiled program throws ‘Non-exhaustive patterns in case’## Summary
It's hard to say what exactly causes this bug; it's very fragile and requires a lot of newer GHC type features. But, with the below combination of unlifted data types, type data, type families, GADTs, and less interesting cir...## Summary
It's hard to say what exactly causes this bug; it's very fragile and requires a lot of newer GHC type features. But, with the below combination of unlifted data types, type data, type families, GADTs, and less interesting circumstantial details, GHC produces a program that crashes with `Non-exhaustive patterns in case`.
This crash is incorrect, as the case expression in question is exhaustive given the types of the GADT constructors. Furthermore, GHC doesn't warn about this at compile time with `-Wincomplete-patterns`. Changing minor details, like inlining the contents of the `BugLib` module into `Main` or replacing the `$` in the definition of `shouldBeExhaustive` with parentheses, results in a program that doesn't crash.
## Steps to reproduce
```hs
{-# LANGUAGE GADTs, UnliftedDatatypes #-}
module BugLib where
import GHC.Exts (UnliftedType)
import GHC.Types (Type)
data UnliftedGADTProxy :: Type -> UnliftedType where
UGPInt :: UnliftedGADTProxy Int
UGPBool :: UnliftedGADTProxy Bool
```
```hs
{-# LANGUAGE GADTs, TypeData, TypeFamilies #-}
{-# OPTIONS_GHC -Wincomplete-patterns -Werror #-}
module Main where
import GHC.Types (Type)
import BugLib
type data TDWrapper = TDWrapperCon Type
type family UnwrapTDW tdw where
UnwrapTDW (TDWrapperCon a) = a
data ProxyBox tdw = ProxyBox (UnliftedGADTProxy (UnwrapTDW tdw))
shouldBeExhaustive :: ProxyBox (TDWrapperCon Int) -> ()
shouldBeExhaustive pb = id $ case pb of ProxyBox UGPInt -> ()
main :: IO ()
main = let
pb :: ProxyBox (TDWrapperCon Int)
pb = ProxyBox UGPInt
!_ = shouldBeExhaustive pb
in putStrLn "works"
```
## Expected behavior
It should compile to a program that prints `works`.
## Environment
* GHC version used: 9.6.2
Optional:
* Operating System: Linux (NixOS)
* System Architecture: x86_64Jaro ReindersJaro Reindershttps://gitlab.haskell.org/ghc/ghc/-/issues/23665Tag top-level bindings even when optimizations are disabled2023-08-14T15:23:09ZJaro ReindersTag top-level bindings even when optimizations are disabledOne complication in implementing top-level unlifted bindings (#23637) concerns missing tags on imported bindings. For example if we have this Core:
```haskell
b :: B Bool
b = B a
```
Where `B :: Type -> UnliftedType` and `a :: Bool`. C...One complication in implementing top-level unlifted bindings (#23637) concerns missing tags on imported bindings. For example if we have this Core:
```haskell
b :: B Bool
b = B a
```
Where `B :: Type -> UnliftedType` and `a :: Bool`. Currently, if `a` is imported from a module that has been compiled with optimizations disabled then this Stg tag inference will rewrite `b` to:
```
b :: B Bool = \r [] case a of a_tCP { __DEFAULT -> B [a_tCP]; };
```
Which means `b` is not fully evaluated, but it is also unlifted so there is no thunk that can postpone that evaluation. This is a problem.
To avoid this, [SPJ proposes a new invariant](https://gitlab.haskell.org/ghc/ghc/-/issues/23637#note_512996):
> **All toplevel values are properly tagged**, so that `x` will never be eval'd by the STG-rewrite pass.
This can be done cheaply, so it shouldn't add that much extra time and space usage to unoptimized builds.
However, we should still be careful. @simonmar [warned](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/2842#note_257975) that exporting more information in interface files can cause more recompilation:
> Similar arguments about recompilation apply to `LFInfo` - we should be careful to minimize the impact this might have on recompilation. So long as we only expose the tag I think we should be OK, the only impact will be changing something from a constructor to non-constructor or vice-versa while keeping its type unchanged, which I imagine is rare.
SPJ also suggested that @clyring might have already done some work in this direction. Is that right?Jaro ReindersJaro Reindershttps://gitlab.haskell.org/ghc/ghc/-/issues/23848STG creates thunks from local strict datacon bindings2023-11-17T14:41:12ZJaro ReindersSTG creates thunks from local strict datacon bindings@Mikolaj discovered unwanted thunks appearing in strict data constructors in his [horde-ad](https://github.com/Mikolaj/horde-ad) package.
![image](/uploads/27e2229017a59ac4eceff78acdefe5e0/image.png)
The relevant code snippet is ([here...@Mikolaj discovered unwanted thunks appearing in strict data constructors in his [horde-ad](https://github.com/Mikolaj/horde-ad) package.
![image](/uploads/27e2229017a59ac4eceff78acdefe5e0/image.png)
The relevant code snippet is ([here on github](https://github.com/Mikolaj/horde-ad/blob/1bbe06831e24bcbb3e8a3dbac9ca83043d909632/src/HordeAd/Core/AstFreshId.hs#L66C7-L66C37)):
```haskell
astRegisterFun
:: (GoodScalar r, KnownNat n)
=> AstRanked s r n -> AstBindings (AstRanked s)
-> (AstBindings (AstRanked s), AstRanked s r n)
{-# NOINLINE astRegisterFun #-}
astRegisterFun !r !l | astIsSmall True r = (l, r)
astRegisterFun !r !l = unsafePerformIO $ do
!freshId <- unsafeGetFreshAstId
let !r2 = AstVar (shapeAst r) $ AstVarName $ astIdToAstVarId freshId
!d = DynamicExists $ AstRToD r
return ((freshId, d) : l, r2)
```
In the STG-from-core we find the following let bindings of data constructors (the identifiers have different uniques than the eventlog2html screenshot above because these the program was recompiled in the meantime):
```haskell
let {
sat_smDI [Occ=Once1] :: HordeAd.Core.Ast.AstDynamic s_akFF r_akFD
[LclId] =
CCCS 0 HordeAd.Core.Ast.AstRToD! [$dKnownNat_smDv r1_smDw]; } in
let {
sat_smDJ [Occ=Once1]
:: HordeAd.Core.Types.DynamicExists
(HordeAd.Core.Ast.AstDynamic s_akFF)
[LclId] =
CCCS 0 HordeAd.Core.Types.DynamicExists! [$dGoodScalar_smDu
sat_smDI];
} in
```
But these are turned into thunks in STG-final:
```haskell
let {
sat_smDI [Occ=Once1] :: HordeAd.Core.Ast.AstDynamic s_akFF r_akFD
[LclId] =
{$dKnownNat_smDv, r1_smDw} \r []
case r1_smDw of r1_tnhn {
__DEFAULT -> HordeAd.Core.Ast.AstRToD #0 [$dKnownNat_smDv r1_tnhn];
}; } in
let {
sat_smDJ [Occ=Once1]
:: HordeAd.Core.Types.DynamicExists
(HordeAd.Core.Ast.AstDynamic s_akFF)
[LclId] =
{$dGoodScalar_smDu, sat_smDI} \r []
case sat_smDI of sat_tnhp {
__DEFAULT ->
HordeAd.Core.Types.DynamicExists #0 [$dGoodScalar_smDu sat_tnhp];
};
```
A simple reproducer from below:
```haskell
{-# OPTIONS_GHC -ddump-stg-final #-}
module T23848 where
data SBox a = SBox !a
idd :: a -> a
{-# OPAQUE idd #-}
idd x = x
astRegisterFun :: Bool -> Maybe (SBox Bool)
astRegisterFun !b | idd b = undefined
astRegisterFun !b = Just $! SBox b
```https://gitlab.haskell.org/ghc/ghc/-/issues/23890Use unlifted types in Core to replace the STG tag-invariant-rewrite pass2024-03-25T08:30:45ZMatthew Cravenclyring@gmail.comUse unlifted types in Core to replace the STG tag-invariant-rewrite passSee also:
* #24271: record evaluated-ness info in results of CPR
## Background
Since ghc-9.4, we insist that a call-by-value calling convention is used for the strict lifted arguments to Cbv-functions and DataCon workers.
Specifically...See also:
* #24271: record evaluated-ness info in results of CPR
## Background
Since ghc-9.4, we insist that a call-by-value calling convention is used for the strict lifted arguments to Cbv-functions and DataCon workers.
Specifically, these arguments must...
1. ...refer directly (no indirections) to an evaluated heap object
2. ...be properly-tagged
Notice that these are exactly the same invariants that we demand of
* case-binders
* variables of boxed unlifted type
We exploit this calling convention to elide evaluations of these arguments when generating code for Cbv-functions and likewise when scrutinizing a variable extracted from a strict field of a data constructor, producing smaller and more efficient code. (The latter is extremely valuable for traversals of spine-strict data structures like `Data.Set.Set`.)
Our Core-to-Core pipeline discards "redundant" evals with great enthusiasm, in many ways. (See also `Note [How untagged pointers can end up in strict fields]` and #15696 and #20749 and #21741, among probably many other discussions.)
So, today, to actually enforce this calling convention, we:
* eta-expand these functions in CorePrep and
* insert evals of strict arguments at their call sites in the STG tag-invariant-rewrite pass (`Stg.InferTags.Rewrite`) just before code generation (StgToWhatever).
* We also perform a simple analysis to avoid inserting evals of arguments that already satisfy 1 and 2.
But it would be really be much simpler and more direct to represent these evaluations and Cbv calling conventions in Core. But then how do we avoid the discarded-redundant-evals problem? Simple.
# Proposal
Represent the call-by-value convention of Cbv-functions and strict DataCon workers in their type!
More specifically, introduce a wired-in type `Strict# :: Type -> UnliftedType` with the semantics that the possible values of type `Strict# a` are exactly the possible values of type `a` that satisfy conditions 1 and 2 above. Then, given a declaration `data T x y = MkT x !y`, the DataCon worker for `MkT` can have type `forall x y. x -> Strict# y -> T x y`. It is impossible for Core-to-Core to discard an eval on the second field without angering lint because `y` and `Strict# y` are different types, with different kinds.
We must explicitly convert between `y` and `Strict# y`, perhaps with operations like these:
* `toStrict# :: a -> Strict# a` evaluates its argument and returns the result.
* `fromStrict# :: Strict# a -> a` is a no-op.
The main difficulty is that we must optimize well in the presence of these operations, or else their use will incur unwanted indirect overheads. For example:
```
let y = Left x in
case toStrict# y of y' { __DEFAULT ->
... case fromStrict# y' of { Left p -> e1 ; Right q -> e2 } ...
}
==> case-of-known-constructor should not be blocked by toStrict#/fromStrict# noise
let y = Left x in
case toStrict# y of y' { __DEFAULT ->
... e1[p/x] ...
}
```
-----
The other thing the tag inference pass gives us is information about which components of the result of a function call must satisfy conditions 1 and 2. We can keep this by letting worker-wrapper-for-CPR create `Strict#` components of unboxed tuple results where appropriate. Conveniently, CPR analysis already gathers this information because it is needed for nested CPR.
-----
!10252 implements the proposed `Strict#` data type. But:
* It currently needs to be rebased, as does !10247 on which it is based.
* It needs a lot of work to improve optimization in the presence of `toStrict#`/`fromStrict#`.
* It implements strict DataCon fields using `Strict#` as described above, but does not yet touch worker/wrapper and replace the Cbv-function mechanism.https://gitlab.haskell.org/ghc/ghc/-/issues/24135Implement Call-by-value convention via `ArgRep`/`LambdaFormInfo`2024-03-25T08:07:12ZSebastian GrafImplement Call-by-value convention via `ArgRep`/`LambdaFormInfo`## Summary
This ticket tracks a proposal to retire tag inference in favor of a mechanism integrating levity into the calling convention.
## Motivation
In #23848 we were investigating ways in which we could prevent tag inference from ...## Summary
This ticket tracks a proposal to retire tag inference in favor of a mechanism integrating levity into the calling convention.
## Motivation
In #23848 we were investigating ways in which we could prevent tag inference from generating thunks to insert field seqs on strictly used binders. The workaround to pass `-fworker-wrapper-cbv` could generally lead to regressions because of rules https://gitlab.haskell.org/ghc/ghc/-/issues/20364 and is generally an unsatisfying mechanism.
Furthermore, "Tag inference" is not some kind of analysis run with -O1, but at this point *it is an integral part of the code generation pipeline*, even with `-O0`. Compile a client module without running tag inference and you get crashes. In light of this, it should better be named `StgPrep` or some such.
## Clarification: CbV, levity and types
Note that when a function is call-by-value, that expresses a *precondition*: Callers of the function are *required* to evaluate lifted arguments before calling the function, and the callee is *permitted* to assume that the argument has been evaluated.
Evaluating a lifted argument before passing it is effectively like passing an *unlifted* pointer as argument.
We ultimately hope to make this precondition precise in STG's and Core's type system (#19530, #19767, #23890), but it is not a concern of this issue. Rather what we discuss here is necessary ground work to work around the annoyances in our current tag inference situation, because unlifted pointers are always properly tagged.
## Proposal
@simonpj and I came to the conclusion (see https://gitlab.haskell.org/ghc/ghc/-/issues/23848#note_531525) that a holistic solution would be to treat call-by-value as the calling convention it really is; as part of GHC's ABI it must be manifest in interface files through exported `LambdaFormInfo` (not `IdDetails` like the current CbV mechanism) and any client must adhere to it (-O0 or not, GHC or ... the next version of GHC). Essentially, we want to do the same for levity mismatch of arguments as we do for arity mismatches: Whenever we encounter a mismatch, generate a slow call that will do the necessary PAP'ing/eval'ing.
From my very rough understanding of everything in `StgTo*` and below, this is the minimum number of steps required to integrate call-by-value/unlifted arg reps:
1. Change `GHC.StgToCmm.ArgRep.ArgRep`: Split `P` into `L` and `U` for lifted and unlifted GCd pointers, respectively.
2. That directly prompts a change for `slowCallPattern`, which is the interface between GHC and the generic apply functions generated by `utils/genapply`.
3. Change `utils/genapply/Main.hs:ArgRep` in the same way as (1). The entry code for a `U` should simply seq the argument.
4. Find the right adjustment to `utils/genapply/Main.hs:applyTypes` and `utils/genapply/Main.hs:stackApplyTypes` that catches 99% of all call sites in reasonable space. To get things running, no change should be needed (but the resulting code will either be huge or slow). Fine-tuning the exact set of apply patterns informed by benchmarks is the bulk of the change, I think, and deserves its own section below.
5. Fill in `slowCallPattern` so that it matches `applyTypes`. Also fix other functions in `GHC.StgToCmm.ArgRep`, the most important being `toArgRep` which turns a `PrimRep` into an `ArgRep`. We must split the `BoxedRep` case in two: One for `Just Lifted` and one for `Just Unlifted`. Furthermore, it is no longer enough to look at
6. Follow references of today's `P` and see which need fixes. `GHC.StgToCmm.Layout`: `argBits` (`U` is also GCd), `stdPattern`. Also some in the ByteCode interpreter.
7. Now I'd work from `GHC.StgToCmm.Expr.cgIdApp` downwards. Specifically, we need a new `LambdaFormInfo` concept of CbV to communicate the information to `GHC.StgToCmm.Closure.getCallMethod`. The latter needs to emit a slow call not only on arity mismatch, but also when one of the args is not unlifted but goes into an unlifted parameter of the callee. Slow call will go through genapply funs IIRC which will do the proper seqs.
We can tell whether an arg is unlifted by looking at the `idDemandInfo` of the `StgArg`.
We must be able to tell whether a parameter of the callee is expected unlifted by means of its `ReEntrant :: ... -> LambdaFormInfo`.
Just as that constructor lists the callee's arity, we must augment it with the function's "CbV marks". A good place to do so would the `ArgDescr`; in fact, for usual apply patterns, the info is already present in `ArgSpec`. For the `ArgGen` case, instead of a `type Liveness = [Bool]` bitmap, we must embed `data Stuff = LiftedP | UnliftedP | Dead; type Liveness = [Stuff]` (TODO: better name).
## Effect
With these changes, we will generate a slow call for a function like this:
```
module A where
data T a = T !a
g :: T Int -> Int
g (T n) = n -- g is strict, hence expects its arg unlifted
module B where
f :: T Int -> Bool -> Int
f a True = g a -- a is not used strictly, hence it is lifted. This will generate a slow call to g
f _ False = 0
```
Compiling `A` with -O1, `g` is detected strict and expects its argument unlifted.
Compiling `B` with -O0, we see that the caller `f` is not strict in `a` (even with `-O1`) but that it passes `a` on to `g`.
Since `a` is lifted, there is an levity mismatch with the `LambdaFormInfo` of `g` (the exact info is in the imported `ArgSpec ARG_U :: ArgDescr`, presumably) -- just like an arity mismatch, this forces a slow call to `g`.
Now, tag inference would simply insert a seq prior to calling `g`, thus guaranteeing compatible unliftedness and eliminating the slow call.
~~Note that there are no thunks involved whatsoever! No #23848.~~ (Edit: This is actually untrue in case `g` is a strict DataCon worker like `T` and let bound. However there are no avoidable thunks such as in #23848.) In fact this transformation is so simple that it could be carried out in CorePrep already, or perhaps as part of `cgIdApp`. Kind of like eta expansion, but for levity (and without introducing new closures).
## Generating generic apply functions
As outlined in (5) above, we need to re-run benchmarks to find a good set of generic apply patterns.
What a generic apply function is and the process to find good candidates has been described in Section 6.2 and 8.2 of https://www.microsoft.com/en-us/research/publication/make-fast-curry-pushenter-vs-evalapply/. Perhaps it's a good idea to extend the benchmark suite beyond NoFib for this purpose, as it displays a relative poverty of unboxed and unlifted types.