Commits · 97d26206d4bdd748b01649319aef6f67abcbbb6d · Reinier Maas / GHC

Feb 17, 2024
- Make cast between words and floats real primops (#24331) · 97d26206
  Sylvain Henry authored 1 year ago and Marge Bot committed 1 year ago
  
  First step towards fixing #24331. Replace foreign prim imports with real primops.
  97d26206
- Add missing BCO handling in scavenge_one. · 902ebcc2
  Ian-Woo Kim authored 1 year ago and Marge Bot committed 1 year ago
  
  902ebcc2
Feb 14, 2024

compiler/rts: fix wasm unreg regression · d8d1333a

Cheng Shao authored 1 year ago and

Marge Bot committed 1 year ago

This commit fixes two wasm unreg regressions caught by a nightly
pipeline:

- Unknown stg_scheduler_loopzh symbol when compiling scheduler.cmm
- Invalid _hs_constructor(101) function name when handling ctor

d8d1333a

rts: drop unused postString function · 0e60d52c
Zubin authored 1 year ago and Marge Bot committed 1 year ago

0e60d52c

rts/eventlog: Place upper bound on IPE string field lengths · 8aafa51c

Ben Gamari authored 1 year ago and

Marge Bot committed 1 year ago

The strings in IPE events may be of unbounded length. Limit the lengths
of these fields to 64k characters to ensure that we don't exceed the
maximum event length.

8aafa51c

rts/EventLog: Place eliminate duplicate strlens · 325b7613

Ben Gamari authored 1 year ago and

Marge Bot committed 1 year ago

Previously many of the `post*` implementations would first compute the
length of the event's strings in order to determine the event length.
Later we would then end up computing the length yet again in
`postString`. Now we instead pass the string length to `postStringLen`,
avoiding the repeated work.

325b7613

Feb 13, 2024

rts: define XXH_INLINE_ALL · ee01de7d

Cheng Shao authored 1 year ago and

Marge Bot committed 1 year ago

This commit cleans up how we include the xxhash.h header and only
define XXH_INLINE_ALL, which is sufficient to inline the xxHash
functions without symbol collision.

ee01de7d

rts: use XXH3_64bits hash on all 64-bit platforms · 4a97bdb8

Cheng Shao authored 1 year ago and

Marge Bot committed 1 year ago

This commit enables XXH3_64bits hash to be used on all 64-bit
platforms. Previously it was only enabled on x86_64, so platforms like
aarch64 silently falls back to using XXH32 which degrades the hashing
function quality.

4a97bdb8

rts: update xxHash to v0.8.2 · b19ec331
Cheng Shao authored 1 year ago and Marge Bot committed 1 year ago

b19ec331
rts/ghc-internal: add JSFFI support logic for wasm32 · 9ad0e2b4
Cheng Shao authored 1 year ago and Marge Bot committed 1 year ago
```
This commit adds rts/ghc-internal logic to support the wasm backend's
JSFFI functionality.
```
9ad0e2b4

compiler: make genSym use C-based atomic increment on non-JS 32-bit platforms · 615eb855

Cheng Shao authored 1 year ago and

Marge Bot committed 1 year ago

The pure Haskell implementation causes i386 regression in unrelated
work that can be fixed by using C-based atomic increment, see added
comment for details.

615eb855

Feb 12, 2024

nonmoving: Add support for heap profiling · bedb4f0d

Teo Camarasu authored 1 year ago and

Marge Bot committed 1 year ago

Add support for heap profiling while using the nonmoving collector.

We greatly simply the implementation by disabling concurrent collection for
GCs when heap profiling is enabled. This entails that the marked objects on
the nonmoving heap are exactly the live objects.

Note that we match the behaviour for live bytes accounting by taking the size
of objects on the nonmoving heap to be that of the segment's block
rather than the object itself.

Resolves #22221

bedb4f0d

JS: add support for linking C sources · aef587f6

Sylvain Henry authored 1 year ago and

Marge Bot committed 1 year ago

Support linking C sources with JS output of the JavaScript backend.
See the added documentation in the users guide.

The implementation simply extends the JS linker to use the objects (.o)
that were already produced by the emcc compiler and which were filtered
out previously. I've also added some options to control the link with C
functions (see the documentation about pragmas).

With this change I've successfully compiled the direct-sqlite package
which embeds the sqlite.c database code. Some wrappers are still
required (see the documentation about wrappers) but everything generic
enough to be reused for other libraries have been integrated into
rts/js/mem.js.

aef587f6

JS: handle stored null StablePtr · df9fd9f7

Sylvain Henry authored 1 year ago and

Marge Bot committed 1 year ago

Some Haskell codes unsafely cast StablePtr into ptr to compare against
NULL. E.g. in direct-sqlite:

  if castStablePtrToPtr aggStPtr /= nullPtr then

where `aggStPtr` is read (`peek`) from zeroed memory initially.

We fix this by giving these StablePtr the same representation as other
null pointers. It's safe because StablePtr at offset 0 is unused (for
this exact reason).

df9fd9f7

Feb 10, 2024

rts: eras profiling mode · b0293f78

Matthew Pickering authored 1 year ago and

Marge Bot committed 1 year ago

The eras profiling mode is useful for tracking the life-time of
closures. When a closure is written, the current era is recorded in the
profiling header. This records the era in which the closure was created.

* Enable with -he
* User mode: Use functions ghc-experimental module GHC.Profiling.Eras to modify the era
* Automatically: --automatic-era-increment, increases the user era on major
  collections
* The first era is era 1
* -he<era> can be used with other profiling modes to select a specific
  era

If you just want to record the era but not to perform heap profiling you
can use `-he --no-automatic-heap-samples`.

https://well-typed.com/blog/2024/01/ghc-eras-profiling/

Fixes #24332

b0293f78

Feb 08, 2024

Move `base` to `ghc-internal` · 44f6557a

Ben Gamari authored 1 year ago and

Marge Bot committed 1 year ago

Here we move a good deal of the implementation of `base` into a new
package, `ghc-internal` such that it can be evolved independently
from the user-visible interfaces of `base`.

While we want to isolate implementation from interfaces, naturally, we
would like to avoid turning `base` into a mere set of module re-exports.
However, this is a non-trivial undertaking for a variety of reasons:

 * `base` contains numerous known-key and wired-in things, requiring
   corresponding changes in the compiler

 * `base` contains a significant amount of C code and corresponding
   autoconf logic, which is very fragile and difficult to break apart

 * `base` has numerous import cycles, which are currently dealt with via
   carefully balanced `hs-boot` files

 * We must not break existing users

To accomplish this migration, I tried the following approaches:

* [Split-GHC.Base]: Break apart the GHC.Base knot to allow incremental
  migration of modules into ghc-internal: this knot is simply too
  intertwined to be easily pulled apart, especially given the rather
  tricky import cycles that it contains)

* [Move-Core]: Moving the "core" connected component of base (roughly
  150 modules) into ghc-internal. While the Haskell side of this seems
  tractable, the C dependencies are very subtle to break apart.

* [Move-Incrementally]:

  1. Move all of base into ghc-internal
  2. Examine the module structure and begin moving obvious modules (e.g.
     leaves of the import graph) back into base
  3. Examine the modules remaining in ghc-internal, refactor as necessary
     to facilitate further moves
  4. Go to (2) iterate until the cost/benefit of further moves is
     insufficient to justify continuing
  5. Rename the modules moved into ghc-internal to ensure that they don't
     overlap with those in base
  6. For each module moved into ghc-internal, add a shim module to base
     with the declarations which should be exposed and any requisite
     Haddocks (thus guaranteeing that base will be insulated from changes
     in the export lists of modules in ghc-internal

Here I am using the [Move-Incrementally] approach, which is empirically
the least painful of the unpleasant options above

Bumps haddock submodule.

Metric Decrease:
    haddock.Cabal
    haddock.base
Metric Increase:
    MultiComponentModulesRecomp
    T16875
    size_hello_artifact

44f6557a

Feb 01, 2024
- Add Note [C11 memory model] · 856b5e75
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  856b5e75
- rts: Fix data races in profiling timer · 55c65dbc
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  55c65dbc
- rts: Use fence rather than redundant load · 6bddfd3d
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  Previously we would use an atomic load to ensure acquire ordering. However, we now have `ACQUIRE_FENCE_ON`, which allows us to express this more directly.
  6bddfd3d
- STM: Use acquire loads when possible · a6316eb4
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  Full sequential consistency is not needed here.
  a6316eb4
- rts: Use relaxed ordering on dirty/clean info tables updates · ef8ccef5
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  When changing the dirty/clean state of a mutable object we needn't have any particular ordering.
  ef8ccef5
- rts/Prof: Fix data race · 60802db5
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  60802db5
- rts/Messages: Fix data race · 9502ad3c
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  9502ad3c
- rts: Fix data race in Interpreter's preemption check · 6af43ab4
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  6af43ab4
- rts: Fix data race in threadStatus# · 26c48dd6
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  26c48dd6
- rts: Fix data race in threadPaused · eb38812e
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  This only affects an assertion in the debug RTS and only needs relaxed ordering.
  eb38812e
- rts: Fix synchronization on thread blocking state · 515eb33d
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  We now use a release barrier whenever we update a thread's blocking state. This required widening StgTSO.why_blocked as AArch64 does not support atomic writes on 16-bit values.
  515eb33d
- rts: Use `switch` to branch on why_blocked · 39e3ac5d
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  This is a semantics-preserving refactoring.
  39e3ac5d
- rts: Add necessary barriers when manipulating TSO owner · d6809ee4
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  d6809ee4
- Use relaxed atomics to manipulate TSO status fields · 8a310e35
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  8a310e35
- codeGen: Use relaxed accesses in ticky bumping · 0785cf81
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  0785cf81
- cmm: Introduce MO_RelaxedRead · 31553b11
  Ben Gamari authored 2 years ago and Marge Bot committed 1 year ago
  
  In hand-written Cmm it can sometimes be necessary to atomically load from memory deep within an expression (e.g. see the `CHECK_GC` macro). This MachOp provides a convenient way to do so without breaking the expression into multiple statements.
  31553b11
Jan 31, 2024
- rts/TraverseHeap.c: Ensure that PosixSource.h is included first · 88c38dd5
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  88c38dd5
Jan 24, 2024

rts: enable wasm32 register mapping · 0cda2b8b

Cheng Shao authored 1 year ago and

Marge Bot committed 1 year ago

The wasm backend didn't properly make use of all Cmm global registers
due to #24347. Now that it is fixed, this patch re-enables full
register mapping for wasm32, and we can now generate smaller & faster
wasm modules that doesn't always spill arguments onto the stack. Fixes #22460 #24152.

0cda2b8b

Jan 19, 2024
- Always refresh profiling CCSes after running pending initializers · 119586ea
  Alexis King authored 1 year ago and Marge Bot committed 1 year ago
  
  Fixes #24171.
  119586ea
Jan 16, 2024

eventlog: Fix off-by-one error in postIPE · 5776008c

Matthew Pickering authored 1 year ago and

Marge Bot committed 1 year ago

We were missing the extra_comma from the calculation of the size of the
payload of postIPE. This was causing assertion failures when the event
would overflow the buffer by one byte, as ensureRoomForVariable event
would report there was enough space for `n` bytes but then we would
write `n + 1` bytes into the buffer.

Fixes #24287

5776008c

Profiling: Adds an option to not start time profiling at startup · 5077416e

Javier Sagredo authored 1 year ago and

Marge Bot committed 1 year ago

Using the functionality provided by
d89deeba, this patch creates a new rts
flag `--no-automatic-time-samples` which disables the time profiling
when starting a program. It is then expected that the user starts it
whenever it is needed.

Fixes #24337

5077416e

Dec 20, 2023

Fix thunk update ordering · 9a52ae46

Ben Gamari authored 2 years ago and

Marge Bot committed 1 year ago

Previously we attempted to ensure soundness of concurrent thunk update
by synchronizing on the access of the thunk's info table pointer field.
This was believed to be sufficient since the indirectee (which may
expose a closure allocated by another core) would not be examined
until the info table pointer update is complete.

However, it turns out that this can result in data races in the presence
of multiple threads racing a update a single thunk. For instance,
consider this interleaving under the old scheme:

            Thread A                             Thread B
            ---------                            ---------
    t=0     Enter t
      1     Push update frame
      2     Begin evaluation

      4     Pause thread
      5     t.indirectee=tso
      6     Release t.info=BLACKHOLE

      7     ... (e.g. GC)

      8     Resume thread
      9     Finish evaluation
      10    Relaxed t.indirectee=x

      11                                         Load t.info
      12                                         Acquire fence
      13                                         Inspect t.indirectee

      14    Release t.info=BLACKHOLE

Here Thread A enters thunk `t` but is soon paused, resulting in `t`
being lazily blackholed at t=6. Then, at t=10 Thread A finishes
evaluation and updates `t.indirectee` with a relaxed store.

Meanwhile, Thread B enters the blackhole. Under the old scheme this
would introduce an acquire-fence but this would only synchronize with
Thread A at t=6. Consequently, the result of the evaluation, `x`, is not
visible to Thread B, introducing a data race.

We fix this by treating the `indirectee` field as we do all other
mutable fields. This means we must always access this field with
acquire-loads and release-stores.

See #23185.

9a52ae46

Dec 13, 2023
- rts/eventlog: Avoid truncating event sizes · 0e0f41c0
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  Previously ensureRoomForVariableEvent would truncate the desired size to 16-bits, resulting in #24197. Fixes #24197.
  0e0f41c0
- rts/eventlog: Honor result of ensureRoomForVariableEvent is · a10f9b9b
  Ben Gamari authored 1 year ago and Marge Bot committed 1 year ago
  
  Previously we would keep plugging along, even if isn't enough room for the event.
  a10f9b9b