GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2021-09-02T16:07:30Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/20313Should we inline constructor wrappers into boring contexts?2021-09-02T16:07:30ZAndreas KlebingerShould we inline constructor wrappers into boring contexts?While working on other things I came across this constructor wrapper:
```
-- RHS size: {terms: 7, types: 6, coercions: 0, joins: 0/0}
GHC.Unit.Types.$WRealUnit [InlPrag=INLINE[final] CONLIKE]
:: forall uid. Definite uid %1 -> GenUnit ...While working on other things I came across this constructor wrapper:
```
-- RHS size: {terms: 7, types: 6, coercions: 0, joins: 0/0}
GHC.Unit.Types.$WRealUnit [InlPrag=INLINE[final] CONLIKE]
:: forall uid. Definite uid %1 -> GenUnit uid
[GblId[DataConWrapper],
Arity=1,
Caf=NoCafRefs,
Str=<SL>,
Cpr=1,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (@uid_a2UY)
(conrep_a3pk [Occ=Once1] :: Definite uid_a2UY) ->
case conrep_a3pk of conrep_X0 [Occ=Once1] { __DEFAULT ->
GHC.Unit.Types.RealUnit @uid_a2UY conrep_X0
}}]
GHC.Unit.Types.$WRealUnit
= \ (@uid_a2UY) (conrep_a3pk [Occ=Once1] :: Definite uid_a2UY) ->
case conrep_a3pk of conrep_X0 [Occ=Once1] { __DEFAULT ->
GHC.Unit.Types.RealUnit @uid_a2UY conrep_X0
}
```
There are a few cases where this kind of wrapper does not get inlined. For example here:
```
case ds8_saQo sat_saQs GHC.Prim.void# of {
Solo# ipv12_saQv [Occ=Once1] ->
let {
sat_saQw [Occ=Once1]
:: GHC.Unit.Types.GenUnit GHC.Unit.Types.UnitId
[LclId] =
{ipv12_saQv} \u [] GHC.Unit.Types.$WRealUnit ipv12_saQv;
} in Solo# [sat_saQw];
```
This is all expected (see the boring_ok=False attribute). I do wonder if it's the right thing to do.
In the example above we will generate a CMM function for the binding `sat_saQw`. This means we pay the overhead of two function calls when evaluating `sat_saQw` but save a bit in code size/compile time.
Things are different for wrappers which evaluate multiple arguments, these can become non-trivial in code size and probably *shouldn't* be inlined into boring contexts.
It seems like there are pros and cons for either choice. We could also make it dependent on the number of cases inside the wrapper. But currently there is no reason given in any of the notes for doing it one way or another. So I'm opening this ticket in case anyone wonders about the current choice or wants to evaluate it going forward.⊥https://gitlab.haskell.org/ghc/ghc/-/issues/20145Let LLVM and Unregisterized lower greater than native sized primops2021-08-18T22:59:32ZJohn EricsonLet LLVM and Unregisterized lower greater than native sized primopsWhen using the LLVM or Unregisterized backends and compiling primops for prim types exceeding the native width, we should avoid our C stubs and let LLVM/C lower them more efficiently in-line.
NCG needs to call slow FFI functions where w...When using the LLVM or Unregisterized backends and compiling primops for prim types exceeding the native width, we should avoid our C stubs and let LLVM/C lower them more efficiently in-line.
NCG needs to call slow FFI functions where we "borrow" the C compiler's implementation, but there is no reason why we need to do that for LLVM or C.⊥John EricsonJohn Ericsonhttps://gitlab.haskell.org/ghc/ghc/-/issues/16891Implement escape analysis for stack allocation.2021-04-23T14:04:57ZAndreas KlebingerImplement escape analysis for stack allocation.# Motivation
Reduce GC pressure.
# Proposal
Figure out if a function stores (parts of) it's arguments in it's return value.
With this information we can look at let bindings. Any binding which is only used as non-escaping argument ca...# Motivation
Reduce GC pressure.
# Proposal
Figure out if a function stores (parts of) it's arguments in it's return value.
With this information we can look at let bindings. Any binding which is only used as non-escaping argument can then be allocated on the stack instead and implicitly freed on return.
This is fairly well understood and should mostly carry over to Haskell. The benefits might be different due to the lazy nature of Haskell though.⊥https://gitlab.haskell.org/ghc/ghc/-/issues/16888Remove (or update the comments for) stg_ap_1_upd_info2019-07-07T17:59:57ZÖmer Sinan AğacanRemove (or update the comments for) stg_ap_1_upd_infoFound this while reading the code but couldn't find an issue for it:
```
/* stg_ap_1_upd_info is a bit redundant, but there appears to be a bug
* in the compiler that means stg_ap_1 is generated occasionally (ToDo)
*/
INFO_TABLE(stg_...Found this while reading the code but couldn't find an issue for it:
```
/* stg_ap_1_upd_info is a bit redundant, but there appears to be a bug
* in the compiler that means stg_ap_1 is generated occasionally (ToDo)
*/
INFO_TABLE(stg_ap_1_upd,1,0,THUNK_1_0,"stg_ap_1_upd_info","stg_ap_1_upd_info")
(P_ node)
{
TICK_ENT_DYN_THK();
STK_CHK_NP(node);
UPD_BH_UPDATABLE(node);
LDV_ENTER(node);
push (UPDATE_FRAME_FIELDS(,,stg_upd_frame_info, CCCS, 0, node)) {
ENTER_CCS_THUNK(node);
jump stg_ap_0_fast
(StgThunk_payload(node,0));
}
}
```
I tried removing this and it caused linker failures because we really generate
references to `stg_ap_1_upd_info`.
I think this for thunks of form
```
let x = y
```
The comment suggests that these could be eliminated, and `stg_ap_1_upd_info`
could be removed. This issue is to either do this or update the comments
explaining why we can't remove `let x = y` thunks and really need
`stg_ap_1_upd_info`.⊥https://gitlab.haskell.org/ghc/ghc/-/issues/15560Full laziness destroys opportunities for join points2024-02-27T21:59:40ZAndreas KlebingerFull laziness destroys opportunities for join pointsEven if we already know a binding is a join point we STILL float it to the top and turn it into a function.
The simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undo...Even if we already know a binding is a join point we STILL float it to the top and turn it into a function.
The simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undoing this by making it a top level binding.
It then stays at the top till we are done resulting in the core I've put in the comments.
```haskell
data T = A | B | C | D | E | F | G
{-# NOINLINE n #-}
n :: T -> T
n A = B
n B = C
n _ = A
f :: Int -> T -> T -> T
f sel x y =
-- function large enough to avoid being simply inlined
let j z = n . n . n . n . n . n $ z
in case sel of
-- j is always tailcalled
0 -> j x
_ -> j y
-- j is floated to top level instead of ending up as joinpoint.
-- T.f_j
-- = \ (eta_B1 [OS=OneShot] :: T) -> n (n (n (n (n (n eta_B1)))))
-- -- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0}
-- f :: Int -> T -> T -> T
-- f = \ (sel_aYP :: Int) (x_aYQ :: T) (y_aYR :: T) ->
-- case sel_aYP of { GHC.Types.I# ds_d2fL ->
-- case ds_d2fL of {
-- __DEFAULT -> T.f_j y_aYR;
-- 0# -> T.f_j x_aYQ
-- }
-- }
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (CodeGen) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | #14287 |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Full laziness destroys opportunities for join points","status":"New","operating_system":"","component":"Compiler (CodeGen)","related":[14287],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":["JoinPoints"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Even if we already know a binding is a join point we STILL float it to the top and turn it into a function.\r\n\r\nThe simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undoing this by making it a top level binding.\r\n\r\nIt then stays at the top till we are done resulting in the core I've put in the comments.\r\n\r\n{{{\r\n#!haskell\r\ndata T = A | B | C | D | E | F | G\r\n\r\n{-# NOINLINE n #-}\r\nn :: T -> T\r\nn A = B\r\nn B = C\r\nn _ = A\r\n\r\nf :: Int -> T -> T -> T\r\nf sel x y =\r\n -- function large enough to avoid being simply inlined\r\n let j z = n . n . n . n . n . n $ z\r\n in case sel of\r\n -- j is always tailcalled\r\n 0 -> j x\r\n _ -> j y\r\n\r\n-- j is floated to top level instead of ending up as joinpoint.\r\n-- T.f_j\r\n-- = \\ (eta_B1 [OS=OneShot] :: T) -> n (n (n (n (n (n eta_B1)))))\r\n\r\n-- -- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0}\r\n-- f :: Int -> T -> T -> T\r\n-- f = \\ (sel_aYP :: Int) (x_aYQ :: T) (y_aYR :: T) ->\r\n-- case sel_aYP of { GHC.Types.I# ds_d2fL ->\r\n-- case ds_d2fL of {\r\n-- __DEFAULT -> T.f_j y_aYR;\r\n-- 0# -> T.f_j x_aYQ\r\n-- }\r\n-- }\r\n}}}\r\n","type_of_failure":"OtherFailure","blocking":[]} -->
## Current plan (as of Dec. 19th 2023)
### A brief summary
"destroys" is an apt description but the mechanism is this:
- some join points get lifted to the top level
- because they get lifted they no longer are join points and instead become top level functions with all the consequences of top level functions, as [this](https://gitlab.haskell.org/ghc/ghc/-/issues/15560#note_159111) comment points out.
- But these previous join point functions have some nice properties:
- They are never exported because they were floated out
- All call sites to them are known and saturated, again because they began life as a join point
### The Conceptual Plan
So the plan is to _not_ focus on join points but _instead_ optimize _all_ top level functions that that are local (i.e., not exported) and whose call sites are all known. Optimizing these functions will, in effect, also optimize the previously-join-point-now-top-level functions.
### The Optimizations
SPJ provides a nice overview in [this](https://gitlab.haskell.org/ghc/ghc/-/issues/15560#note_159253) comment:
1) Elide the info-table and closure generation for top level functions that are local and whose call sites are all known and saturated. This should reduce the code size increase that occurs from the `join point -> top level function` conversion.
2) Elide the stack overflow check for top level functions that have the properties of (1) _and_ are tail recursive, call other top level functions with the same properties, or are not recursive.
3) Eliminate Heap Checks by absorbing the checks into the caller of the function. This is an orthogonal optimization to this ticket and is currently not done for join points nor top level functions. Please see Simon's comment linked above.
### The Implementation Plan
Implement the optimizations in order:
For (1):
- Add a phase to Stg called `CgPrep` for `code gen prep`.
- Absorb the `InferTags` pass into `CgPrep`. InferTags already defines a CgPrep pass and a `CgInfo` record that is passed to the code generator for each backend, so this item is just a refactoring.
- Add a pass to `CgPrep` that detects and records all `Id`s that are functions and whose call sites are all known and saturated. Pass this set of `Id`s to `CgInfo`.
- For a backend `b`, use the new field in `CgInfo` to elide the code generation for info-tables and closures for top-level local functions whose `Id`s are also elements the new field.
The strategy here is to keep inspection of call sites at `Stg` instead of `StgToCmm` so that different backends (for example the JS backend) can implement (1) in their own code generator.
For (2):
- Todo
For (3):
- Should be tracked in another ticket (Todo)⊥doyougnujmy6342@gmail.comdoyougnujmy6342@gmail.comhttps://gitlab.haskell.org/ghc/ghc/-/issues/14226Common Block Elimination pass doesn't eliminate common blocks2021-09-07T15:40:17ZBen GamariCommon Block Elimination pass doesn't eliminate common blocksIn #14222 it was noted that something appears to be broken in `CmmCommonBlockElim`. Consider the program from that ticket,
```hs
module T14221 where
import Data.Text as T
isNumeric :: Text -> Bool
isNumeric t =
T.all isNumeric' t ...In #14222 it was noted that something appears to be broken in `CmmCommonBlockElim`. Consider the program from that ticket,
```hs
module T14221 where
import Data.Text as T
isNumeric :: Text -> Bool
isNumeric t =
T.all isNumeric' t && T.any isNumber t
where
isNumber c = '0' <= c && c <= '9'
isNumeric' c = isNumber c
|| c == 'e'
|| c == 'E'
|| c == '.'
|| c == '-'
|| c == '+'
```
This program produces six copies of a block of the form,
```
c6JT:
R2 = I64[R1 + 7];
R1 = P64[Sp + 8];
Sp = Sp + 16;
call $wloop_all_s6CQ_info(R2, R1) args: 8, res: 0, upd: 8;
```
in the `-ddump-opt-cmm` output, which are manifest in the assembler as,
```asm
block_c6JT_info:
_c6JT:
movq 7(%rbx),%r14
movq 8(%rbp),%rbx
addq $16,%rbp
jmp $wloop_all_s6CQ_info
```
CBE really ought to be catching these.⊥Michal TerepetaMichal Terepetahttps://gitlab.haskell.org/ghc/ghc/-/issues/11261Implement DWARF debugging on powerpc642021-09-07T15:27:56ZPeter Trommlerptrommler@acm.orgImplement DWARF debugging on powerpc64debug:
```
ghc-stage2: panic! (the 'impossible' happened)
(GHC version 7.11.20151219 for powerpc64-unknown-linux):
dwarfReturnRegNo: Unsupported platform!
CallStack (from ImplicitParams):
error, called at compiler/nativeGen/...debug:
```
ghc-stage2: panic! (the 'impossible' happened)
(GHC version 7.11.20151219 for powerpc64-unknown-linux):
dwarfReturnRegNo: Unsupported platform!
CallStack (from ImplicitParams):
error, called at compiler/nativeGen/Dwarf/Constants.hs:224:19 in ghc:Dwarf.Constants
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
```
Provide DWARF constants for registers.
Still TODO:
- [ ] add unwinding information to `StgCRun` to ensure that the unwinder can unwind from Haskell into C
- [ ] to support the RTS unwinder: add support to the initial register callback `set_initial_registers` in `rts/Libdw.c`
- [ ] Valid unwind records in `stg_stop_thread` (defined in `rts/StgStartup.cmm`)
- [ ] Support in the native code generator (by implementing the `extractUnwindPoints` field of `NcgImpl`)
- [ ] Unwinding support in `libdw`
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------ |
| Version | 7.11 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (CodeGen) |
| Test case | debug, T10667 |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Implement DWARF debugging on powerpc64","status":"New","operating_system":"","component":"Compiler (CodeGen)","related":[],"milestone":"8.2.1","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"trommler"},"version":"7.11","keywords":[],"differentials":[],"test_case":"debug, T10667","architecture":"","cc":[""],"type":"Bug","description":"debug:\r\n{{{\r\nghc-stage2: panic! (the 'impossible' happened)\r\n (GHC version 7.11.20151219 for powerpc64-unknown-linux):\r\n dwarfReturnRegNo: Unsupported platform!\r\nCallStack (from ImplicitParams):\r\n error, called at compiler/nativeGen/Dwarf/Constants.hs:224:19 in ghc:Dwarf.Constants\r\n\r\nPlease report this as a GHC bug: http://www.haskell.org/ghc/reportabug\r\n}}}\r\n\r\nProvide DWARF constants for registers.","type_of_failure":"OtherFailure","blocking":[]} -->⊥Peter Trommlerptrommler@acm.orgPeter Trommlerptrommler@acm.orghttps://gitlab.haskell.org/ghc/ghc/-/issues/8949Deprecate -msse2 and -msse flags2021-09-07T15:27:12ZerrgeDeprecate -msse2 and -msse flagsI propose msse2 to be on by default. I guess the default was chosen way back, when Pentium III support was still relevant.
Nowadays we don't really win on the CPU support, because e.g. https://github.com/tibbe/hashable/blob/master/hasha...I propose msse2 to be on by default. I guess the default was chosen way back, when Pentium III support was still relevant.
Nowadays we don't really win on the CPU support, because e.g. https://github.com/tibbe/hashable/blob/master/hashable.cabal is built by default with sse2 on the injected C code level. And hashable has a lot of reverse depends, therefore on an end user system (RedHat or Debian) the user is most probably unlucky with a Pentium III CPU anyways.
Flipping this default would also fix the excess precision problem for most users of GHC on the i686 platform.
GHC should provide a -mno-sse2 flag for the cases when this needs to be disabled.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------ |
| Version | 7.9 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (CodeGen) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | simonmar |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"switch -msse2 to be on by default","status":"New","operating_system":"","component":"Compiler (CodeGen)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.9","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["simonmar"],"type":"Bug","description":"I propose msse2 to be on by default. I guess the default was chosen way back, when Pentium III support was still relevant.\r\n\r\nNowadays we don't really win on the CPU support, because e.g. https://github.com/tibbe/hashable/blob/master/hashable.cabal is built by default with sse2 on the injected C code level. And hashable has a lot of reverse depends, therefore on an end user system (RedHat or Debian) the user is most probably unlucky with a Pentium III CPU anyways.\r\n\r\nFlipping this default would also fix the excess precision problem for most users of GHC on the i686 platform.\r\n\r\nGHC should provide a -mno-sse2 flag for the cases when this needs to be disabled.","type_of_failure":"OtherFailure","blocking":[]} -->⊥