- Feb 17, 2024
-
-
First step towards fixing #24331. Replace foreign prim imports with real primops.
-
-
- Feb 14, 2024
-
-
This commit fixes two wasm unreg regressions caught by a nightly pipeline: - Unknown stg_scheduler_loopzh symbol when compiling scheduler.cmm - Invalid _hs_constructor(101) function name when handling ctor
-
The strings in IPE events may be of unbounded length. Limit the lengths of these fields to 64k characters to ensure that we don't exceed the maximum event length.
-
Previously many of the `post*` implementations would first compute the length of the event's strings in order to determine the event length. Later we would then end up computing the length yet again in `postString`. Now we instead pass the string length to `postStringLen`, avoiding the repeated work.
-
- Feb 13, 2024
-
-
This commit cleans up how we include the xxhash.h header and only define XXH_INLINE_ALL, which is sufficient to inline the xxHash functions without symbol collision.
-
This commit enables XXH3_64bits hash to be used on all 64-bit platforms. Previously it was only enabled on x86_64, so platforms like aarch64 silently falls back to using XXH32 which degrades the hashing function quality.
-
-
This commit adds rts/ghc-internal logic to support the wasm backend's JSFFI functionality.
-
The pure Haskell implementation causes i386 regression in unrelated work that can be fixed by using C-based atomic increment, see added comment for details.
-
- Feb 12, 2024
-
-
Add support for heap profiling while using the nonmoving collector. We greatly simply the implementation by disabling concurrent collection for GCs when heap profiling is enabled. This entails that the marked objects on the nonmoving heap are exactly the live objects. Note that we match the behaviour for live bytes accounting by taking the size of objects on the nonmoving heap to be that of the segment's block rather than the object itself. Resolves #22221
-
Support linking C sources with JS output of the JavaScript backend. See the added documentation in the users guide. The implementation simply extends the JS linker to use the objects (.o) that were already produced by the emcc compiler and which were filtered out previously. I've also added some options to control the link with C functions (see the documentation about pragmas). With this change I've successfully compiled the direct-sqlite package which embeds the sqlite.c database code. Some wrappers are still required (see the documentation about wrappers) but everything generic enough to be reused for other libraries have been integrated into rts/js/mem.js.
-
Some Haskell codes unsafely cast StablePtr into ptr to compare against NULL. E.g. in direct-sqlite: if castStablePtrToPtr aggStPtr /= nullPtr then where `aggStPtr` is read (`peek`) from zeroed memory initially. We fix this by giving these StablePtr the same representation as other null pointers. It's safe because StablePtr at offset 0 is unused (for this exact reason).
-
- Feb 10, 2024
-
-
The eras profiling mode is useful for tracking the life-time of closures. When a closure is written, the current era is recorded in the profiling header. This records the era in which the closure was created. * Enable with -he * User mode: Use functions ghc-experimental module GHC.Profiling.Eras to modify the era * Automatically: --automatic-era-increment, increases the user era on major collections * The first era is era 1 * -he<era> can be used with other profiling modes to select a specific era If you just want to record the era but not to perform heap profiling you can use `-he --no-automatic-heap-samples`. https://well-typed.com/blog/2024/01/ghc-eras-profiling/ Fixes #24332
-
- Feb 08, 2024
-
-
Here we move a good deal of the implementation of `base` into a new package, `ghc-internal` such that it can be evolved independently from the user-visible interfaces of `base`. While we want to isolate implementation from interfaces, naturally, we would like to avoid turning `base` into a mere set of module re-exports. However, this is a non-trivial undertaking for a variety of reasons: * `base` contains numerous known-key and wired-in things, requiring corresponding changes in the compiler * `base` contains a significant amount of C code and corresponding autoconf logic, which is very fragile and difficult to break apart * `base` has numerous import cycles, which are currently dealt with via carefully balanced `hs-boot` files * We must not break existing users To accomplish this migration, I tried the following approaches: * [Split-GHC.Base]: Break apart the GHC.Base knot to allow incremental migration of modules into ghc-internal: this knot is simply too intertwined to be easily pulled apart, especially given the rather tricky import cycles that it contains) * [Move-Core]: Moving the "core" connected component of base (roughly 150 modules) into ghc-internal. While the Haskell side of this seems tractable, the C dependencies are very subtle to break apart. * [Move-Incrementally]: 1. Move all of base into ghc-internal 2. Examine the module structure and begin moving obvious modules (e.g. leaves of the import graph) back into base 3. Examine the modules remaining in ghc-internal, refactor as necessary to facilitate further moves 4. Go to (2) iterate until the cost/benefit of further moves is insufficient to justify continuing 5. Rename the modules moved into ghc-internal to ensure that they don't overlap with those in base 6. For each module moved into ghc-internal, add a shim module to base with the declarations which should be exposed and any requisite Haddocks (thus guaranteeing that base will be insulated from changes in the export lists of modules in ghc-internal Here I am using the [Move-Incrementally] approach, which is empirically the least painful of the unpleasant options above Bumps haddock submodule. Metric Decrease: haddock.Cabal haddock.base Metric Increase: MultiComponentModulesRecomp T16875 size_hello_artifact
-
- Feb 01, 2024
-
-
-
-
Previously we would use an atomic load to ensure acquire ordering. However, we now have `ACQUIRE_FENCE_ON`, which allows us to express this more directly.
-
Full sequential consistency is not needed here.
-
When changing the dirty/clean state of a mutable object we needn't have any particular ordering.
-
-
-
-
-
This only affects an assertion in the debug RTS and only needs relaxed ordering.
-
We now use a release barrier whenever we update a thread's blocking state. This required widening StgTSO.why_blocked as AArch64 does not support atomic writes on 16-bit values.
-
This is a semantics-preserving refactoring.
-
-
-
-
In hand-written Cmm it can sometimes be necessary to atomically load from memory deep within an expression (e.g. see the `CHECK_GC` macro). This MachOp provides a convenient way to do so without breaking the expression into multiple statements.
-
- Jan 31, 2024
-
-
- Jan 24, 2024
-
-
The wasm backend didn't properly make use of all Cmm global registers due to #24347. Now that it is fixed, this patch re-enables full register mapping for wasm32, and we can now generate smaller & faster wasm modules that doesn't always spill arguments onto the stack. Fixes #22460 #24152.
-
- Jan 19, 2024
-
-
Fixes #24171.
-
- Jan 16, 2024
-
-
We were missing the extra_comma from the calculation of the size of the payload of postIPE. This was causing assertion failures when the event would overflow the buffer by one byte, as ensureRoomForVariable event would report there was enough space for `n` bytes but then we would write `n + 1` bytes into the buffer. Fixes #24287
-
- Dec 20, 2023
-
-
Previously we attempted to ensure soundness of concurrent thunk update by synchronizing on the access of the thunk's info table pointer field. This was believed to be sufficient since the indirectee (which may expose a closure allocated by another core) would not be examined until the info table pointer update is complete. However, it turns out that this can result in data races in the presence of multiple threads racing a update a single thunk. For instance, consider this interleaving under the old scheme: Thread A Thread B --------- --------- t=0 Enter t 1 Push update frame 2 Begin evaluation 4 Pause thread 5 t.indirectee=tso 6 Release t.info=BLACKHOLE 7 ... (e.g. GC) 8 Resume thread 9 Finish evaluation 10 Relaxed t.indirectee=x 11 Load t.info 12 Acquire fence 13 Inspect t.indirectee 14 Release t.info=BLACKHOLE Here Thread A enters thunk `t` but is soon paused, resulting in `t` being lazily blackholed at t=6. Then, at t=10 Thread A finishes evaluation and updates `t.indirectee` with a relaxed store. Meanwhile, Thread B enters the blackhole. Under the old scheme this would introduce an acquire-fence but this would only synchronize with Thread A at t=6. Consequently, the result of the evaluation, `x`, is not visible to Thread B, introducing a data race. We fix this by treating the `indirectee` field as we do all other mutable fields. This means we must always access this field with acquire-loads and release-stores. See #23185.
-
- Dec 13, 2023