Commits · 228dcae6a0efbe5289add5330c46f581780dd96c · Reinier Maas / GHC

May 24, 2024

cmm: add word <-> double/float bitcast · bdcc0f37

jeffrey young authored 10 months ago and

Marge Bot committed 10 months ago

- closes: #25331

This is the last step in the project plan described in #25331. This
commit:

- adds bitcast operands for x86_64, LLVM, aarch64
- For PPC and i386 we resort to using the cmm implementations
- renames conversion MachOps from Conv to Round|Truncate

bdcc0f37

May 23, 2024
- PPC: Support ELF v2 on powerpc64 big-endian · ead75532
  Peter Trommler authored 3 years ago and Marge Bot committed 10 months ago
  
  Detect ELF v2 on PowerPC 64-bit systems. Check for `_CALL_ELF` preprocessor macro. Fixes #21191
  ead75532
May 22, 2024
- rts: Fix size of StgOrigThunkInfo frames · 6d7e6ad8
  Ben Gamari authored 10 months ago and Marge Bot committed 10 months ago
  
  Previously the entry code of the `stg_orig_thunk` frame failed to account for the size of the profiling header as it hard-coded the frame size. Fix this. Fixes #24809.
  6d7e6ad8
- Reverse arguments to stgCallocBytes (fix #24828) · 6838a7c3
  Sylvain Henry authored 10 months ago and Marge Bot committed 10 months ago
  
  6838a7c3
May 17, 2024

rts: fix I/O manager compilation errors for win32 target · 710665bd

Cheng Shao authored 10 months ago and

Marge Bot committed 10 months ago

This patch fixes I/O manager compilation errors for win32 target
discovered when cross-compiling to win32 using recent clang:

```
rts/win32/ThrIOManager.c:117:7: error:
     error: call to undeclared function 'is_io_mng_native_p'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      117 |   if (is_io_mng_native_p ()) {
          |       ^
    |
117 |   if (is_io_mng_native_p ()) {
    |       ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/fs.c:143:28: error:
     error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
      143 | int setErrNoFromWin32Error () {
          |                            ^
          |                             void
    |
143 | int setErrNoFromWin32Error () {
    |                            ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/win32/ConsoleHandler.c:227:9: error:
     error: call to undeclared function 'interruptIOManagerEvent'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      227 |         interruptIOManagerEvent ();
          |         ^
    |
227 |         interruptIOManagerEvent ();
    |         ^

rts/win32/ConsoleHandler.c:227:9: error:
     note: did you mean 'getIOManagerEvent'?
    |
227 |         interruptIOManagerEvent ();
    |         ^

rts/include/rts/IOInterface.h:27:10: error:
     note: 'getIOManagerEvent' declared here
       27 | void *   getIOManagerEvent  (void);
          |          ^
   |
27 | void *   getIOManagerEvent  (void);
   |          ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/win32/ConsoleHandler.c:196:9: error:
     error: call to undeclared function 'setThreadLabel'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      196 |         setThreadLabel(cap, t, "signal handler thread");
          |         ^
    |
196 |         setThreadLabel(cap, t, "signal handler thread");
    |         ^

rts/win32/ConsoleHandler.c:196:9: error:
     note: did you mean 'postThreadLabel'?
    |
196 |         setThreadLabel(cap, t, "signal handler thread");
    |         ^

rts/eventlog/EventLog.h:118:6: error:
     note: 'postThreadLabel' declared here
      118 | void postThreadLabel(Capability    *cap,
          |      ^
    |
118 | void postThreadLabel(Capability    *cap,
    |      ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
```

710665bd

rts: Allocate non-moving segments with megablocks · b38dcf39

Teo Camarasu authored 1 year ago and

Marge Bot committed 10 months ago

Non-moving segments are 8 blocks long and need to be aligned.
Previously we serviced allocations by grabbing 15 blocks, finding
an aligned 8 block group in it and returning the rest.
This proved to lead to high levels of fragmentation as a de-allocating a segment
caused an 8 block gap to form, and this could not be reused for allocation.

This patch introduces a segment allocator based around using entire
megablocks to service segment allocations in bulk.

When there are no free segments, we grab an entire megablock and fill it
with aligned segments. As the megablock is free, we can easily guarantee
alignment. Any unused segments are placed on a free list.

It only makes sense to free segments in bulk when all of the segments in
a megablock are freeable. After sweeping, we grab the free list, sort it,
and find all groups of segments where they cover the megablock and free
them.
This introduces a period of time when free segments are not available to
the mutator, but the risk that this would lead to excessive allocation
is low. Right after sweep, we should have an abundance of partially full
segments, and this pruning step is relatively quick.

In implementing this we drop the logic that kept NONMOVING_MAX_FREE
segments on the free list.

We also introduce an eventlog event to log the amount of pruned/retained
free segments.

See Note [Segment allocation strategy]

Resolves #24150

-------------------------
Metric Decrease:
    T13253
    T19695
-------------------------

b38dcf39

rts: do not prefetch mark_closure bdescr in non-moving gc when ASSERTS_ENABLED · 886ab43a

Cheng Shao authored 1 year ago and

Marge Bot committed 10 months ago

This commit fixes a small an oversight in !12148: the prefetch logic
in non-moving GC may trap in debug RTS because it calls Bdescr() for
mark_closure which may be a static one. It's fine in non-debug RTS
because even invalid bdescr addresses are prefetched, they will not
cause segfaults, so this commit implements the most straightforward
fix: don't prefetch mark_closure bdescr when assertions are enabled.

886ab43a

May 10, 2024

IPE: Eliminate dependency on Read · ab840ce6

Ben Gamari authored 10 months ago and

Marge Bot committed 10 months ago

Instead of encoding the closure type as decimal string we now simply
represent it as an integer, eliminating the need for `Read` in
`GHC.Internal.InfoProv.Types.peekInfoProv`.

Closes #24504.

-------------------------
Metric Decrease:
    T24602_perf_size
    size_hello_artifact
-------------------------

ab840ce6

May 02, 2024

GHCi: support inlining breakpoints (#24712) · b85b1199

Sylvain Henry authored 11 months ago and

Marge Bot committed 10 months ago

When a breakpoint is inlined, its context may change (e.g. tyvars in
scope). We must take this into account and not used the breakpoint tick
index as its sole identifier. Each instance of a breakpoint (even with
the same tick index) now gets a different "info" index.

We also need to distinguish modules:
- tick module: module with the break array (tick counters, status, etc.)
- info module: module having the CgBreakInfo (info at occurrence site)

b85b1199

STM: Be more optimistic when validating in-flight transactions. · 917ef81b

Andreas Klebinger authored 1 year ago and

Marge Bot committed 10 months ago

* Don't lock tvars when performing non-committal validation.
* If we encounter a locked tvar don't consider it a failure.

This means in-flight validation will only fail if committing at the
moment of validation is *guaranteed* to fail.

This prevents in-flight validation from failing spuriously if it happens in
parallel on multiple threads or parallel to thread comitting.

917ef81b

STM: Remove (unused)coarse grained locking. · ac9c5f84
Andreas Klebinger authored 1 year ago and Marge Bot committed 10 months ago
```
The STM code had a coarse grained locking mode guarded by #defines that was unused.
This commit removes the code.
```
ac9c5f84

Apr 21, 2024

JS: Stubs for code without actual implementation detected by Google Closure Compiler (fixes #24602) · 5962fa52
Serge S. Gulin authored 1 year ago
```
These errors were fixed just by introducing stubbed functions with throw for further implementation.
```
5962fa52

JS: thread.js requires h$fds and h$fdReady to be declared for static code analysis, minimal · a45a5712

Serge S. Gulin authored 1 year ago

code copied from GHCJS (fixes #24602)

I've just copied some old pieces of GHCJS from publicly available sources (See https://github.com/Taneb/shims/blob/a6dd0202dcdb86ad63201495b8b5d9763483eb35/src/io.js#L607).
Also I didn't put details to h$fds. I took minimal and left only its object initialization: `var h$fds = {};`

a45a5712

JS: trivial checks for variable presence (fixes #24602) · 3db54f9b
Serge S. Gulin authored 1 year ago

3db54f9b

JS: fix typos and namings (fixes #24602) · c70b9ddb

Serge S. Gulin authored 1 year ago

You may noted that I've also changed term of

```
, global "h$vt_double" ||= toJExpr IntV
```

See "IntV"

and

```
  WaitReadOp  -> \[] [fd] -> pure $ PRPrimCall $ returnS (app
"h$waidRead" [fd])
```

See "h$waidRead"

c70b9ddb

Apr 17, 2024

rts: Ignore EINTR while polling in timerfd itimer implementation · 3a0642ea

Ben Gamari authored 11 months ago and

Marge Bot committed 11 months ago

While the RTS does attempt to mask signals, it may be that a foreign
library unmasks them. This previously caused benign warnings which we
now ignore.

See #24610.

3a0642ea

Apr 12, 2024
- rts: Improve tracing message when nursery is resized · c3489547
  Matthew Pickering authored 11 months ago and Marge Bot committed 11 months ago
  
  It is sometimes more useful to know how much bigger or smaller the nursery got when it is resized. In particular I am trying to investigate situations where we end up with fragmentation due to the nursery (#24577)
  c3489547
- rts: Implement set_initial_registers for AArch64 · b0fbd181
  Ben Gamari authored 11 months ago and Marge Bot committed 11 months ago
  
  Fixes #23680.
  b0fbd181
- RTS: Emit warning when -M < -H · 23c3e624
  Andreas Klebinger authored 1 year ago and Marge Bot committed 11 months ago
  
  Fixes #24487
  23c3e624
Apr 10, 2024

rts: Make addDLL a wrapper around loadNativeObj · dcfaa190

Rodrigo Mesquita authored 1 year ago and

Marge Bot committed 11 months ago

Rewrite the implementation of `addDLL` as a wrapper around the more
principled `loadNativeObj` rts linker function. The latter should be
preferred while the former is preserved for backwards compatibility.

`loadNativeObj` was previously only available on ELF platforms, so this
commit further refactors the rts linker to transform loadNativeObj_ELF
into loadNativeObj_POSIX, which is available in ELF and MachO platforms.

The refactor made it possible to remove the `dl_mutex` mutex in favour
of always using `linker_mutex` (rather than a combination of both).

Lastly, we implement `loadNativeObj` for Windows too.

dcfaa190

linker: Avoid linear search when looking up Haskell symbols via dlsym · e008a19a

Alexis King authored 1 year ago and

Marge Bot committed 11 months ago


See the primary Note [Looking up symbols in the relevant objects] for a
more in-depth explanation.

When dynamically loading a Haskell symbol (typical when running a splice or
GHCi expression), before this commit we would search for the symbol in
all dynamic libraries that were loaded. However, this could be very
inefficient when too many packages are loaded (which can happen if there are
many package dependencies) because the time to lookup the would be
linear in the number of packages loaded.

This commit drastically improves symbol loading performance by
introducing a mapping from units to the handles of corresponding loaded
dlls. These handles are returned by dlopen when we load a dll, and can
then be used to look up in a specific dynamic library.

Looking up a given Name is now much more precise because we can get
lookup its unit in the mapping and lookup the symbol solely in the
handles of the dynamic libraries loaded for that unit.

In one measurement, the wait time before the expression was executed
went from +-38 seconds down to +-2s.

This commit also includes Note [Symbols may not be found in pkgs_loaded],
explaining the fallback to the old behaviour in case no dll can be found
in the unit mapping for a given Name.

Fixes #23415

Co-authored-by: Rodrigo Mesquita <(@alt-romes)>

e008a19a

rts: free error message before returning · dd530bb7
Rodrigo Mesquita authored 1 year ago and Marge Bot committed 11 months ago
```
Fixes a memory leak in rts/linker/PEi386.c
```
dd530bb7

Apr 03, 2024

Conditionally ignore some GCC warnings · 83a74d20

Duncan Coutts authored 1 year ago and

Marge Bot committed 11 months ago

Some GCC versions don't know about some warnings, and they complain
that we're ignoring unknown warnings. So we try to ignore the warning
based on the GCC version.

83a74d20

waitRead# / waitWrite# do not work for win32-legacy I/O manager · 8023bad4

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

Previously it was unclear that they did not work because the code path
was shared with other I/O managers (in particular select()).

Following the code carefully shows that what actually happens is that
the calling thread would block forever: the thread will be put into the
blocked queue, but no other action is scheduled that will ever result in
it getting unblocked.

It's better to just fail loudly in case anyone accidentally calls it,
also it's less confusing code.

8023bad4

Include the default I/O manager in the +RTS --info output · c7d3e3a3
Duncan Coutts authored 1 year ago and Marge Bot committed 11 months ago
```
Document the extra +RTS --info output in the user guide
```
c7d3e3a3

Add tracing for the main I/O manager actions · 9c51473b

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago


Using the new tracer class.

Note: The unconditional definition of showIOManager should be
compatible with the debugTrace change in 7c7d1f66.

Co-authored-by: Pi Delport <pi@well-typed.com>

9c51473b

The select() I/O manager does have some global initialisation · 877a2a80
Duncan Coutts authored 2 years ago and Marge Bot committed 11 months ago
```
It's just to make sure an exception CAF is a GC root.
```
877a2a80

Make struct CapIOManager be fully opaque · aaa294d0

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

Provide an opaque (forward) definition in Capability.h (since the cap
contains a *CapIOManager) and then only provide a full definition in
a new file IOManagerInternals.h. This new file is only supposed to be
included by the IOManager implementation, not by its users. So that
means IOManager.c and individual I/O manager implementations.

The posix/Signals.c still needs direct access, but that should be
eliminated. Anything that needs direct access either needs to be clearly
part of an I/O manager (e.g. the sleect() one) or go via a proper API.

aaa294d0

Select an I/O manager early in RTS startup · 3be6d591
Duncan Coutts authored 2 years ago and Marge Bot committed 11 months ago
```
We need to select the I/O manager to use during startup before the
per-cap I/O manager initialisation.
```
3be6d591

Add I/O manager API notifyIOManagerCapabilitiesChanged · 94a87d21

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

Used in setNumCapabilities.

It only does anything for MIO on Posix.

Previously it always invoked Haskell code, but that code only did
anything on non-Windows (and non-JS), and only threaded. That currently
effectively means the MIO I/O manager on Posix.

So now it only invokes it for the MIO Posix case.

94a87d21

Add an IOManager API for scavenging TSO blocked_info · 4161f516

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

When the GC scavenges a TSO it needs to scavenge the tso->blocked_info
but the blocked_info is a big union and what lives there depends on the
two->why_blocked, which for I/O-related reasons is something that in
principle is the responsibility of the I/O manager and not the GC. So
the right thing to do is for the GC to ask the I/O manager to sscavenge
the blocked_info if it encounters any I/O-related why_blocked reasons.

So we add scavengeTSOIOManager in IOManager.{h,c} with the usual style.

Now as it happens, right now, there is no special scavenging to do, so
the implementation of scavengeTSOIOManager is a fancy no-op. That's
because the select I/O manager uses only the fd and target members,
which are not GC pointers, and the win32-legacy I/O manager _ought_ to
be using GC-managed heap objects for the StgAsyncIOResult but it is
actually usingthe C heap, so again no GC pointers. If the win32-legacy
were doing this more sensibly, then scavengeTSOIOManager would be the
right place to do the GC magic.

Future I/O managers will need GC heap objects in the tso->blocked_info
and will make use of this functionality.

4161f516

Tidy up a couple things in Select.{h,c} · d30c6bc6

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

Use the standard #include {Begin,End}Private.h style rather than
RTS_PRIVATE on individual decls.

And conditionally build the code for the select I/O manager based on
the new CPP IOMGR_ENABLED_SELECT rather than on THREADED_RTS.

d30c6bc6

Rename awaitEvent in select and win32 I/O managers · 5ad4b30f

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

These are now just called from IOManager.c and are the per-I/O manager
backend impls (whereas previously awaitEvent was the entry point).

Follow the new naming convention in the IOManager.{h,c} of
awaitCompletedTimeoutsOrIO, with the I/O manager's name as a suffix:
so awaitCompletedTimeoutsOrIO{Select,Win32}.

5ad4b30f

Move awaitEvent into a proper IOManager API · 4f9e9c4e

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

and have the scheduler use it.

Previously the scheduler calls awaitEvent directly, and awaitEvent is
implemented directly in the RTS I/O managers (select, win32). This
relies on the old scheme where there's a single active I/O manager for
each platform and RTS way.

We want to move that to go via an API in IOManager.{h,c} which can then
call out to the active I/O manager.

Also take the opportunity to split awaitEvent into two. The existing
awaitEvent has a bool wait parameter, to say if the call should be
blocking or non-blocking. We split this into two separate functions:
pollCompletedTimeoutsOrIO and awaitCompletedTimeoutsOrIO. We split them
for a few reasons: they have different post-conditions (specifically the
await version is supposed to guarantee that there are threads runnable
when it completes). Secondly, it is also anticipated that in future I/O
managers the implementations of the two cases will be simpler if they
are separated.

4f9e9c4e

Have the throwTo impl go via (new) IOManager APIs · f0c1f862

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

rather than directly operating on the IO manager's data structures.

Specifically, when thowing an async exception to a thread that is
blocked waiting for I/O or waiting for a timer, then we want to cancel
that I/O waiting or cancel the timer. Currently this is done directly in
removeFromQueues() in RaiseAsync.c. We want it to go via proper APIs
both for modularity but also to let us support multiple I/O managers.

So add sync{IO,Delay}Cancel, which is the cancellation for the
corresponding sync{IO,Delay}. The implementations of these use the usual
"switch (iomgr_type)" style.

f0c1f862

Add a new trace class for the iomanager · b48805b9

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

It makes sense now for it to be separate from the scheduler class of
tracers.

Enabled with +RTS -Do. Document the -Do debug flag in the user guide.

b48805b9

Take a simpler approach to gcc warnings in IOManager.c · f70b8108

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

We have lots of functions with conditional implementations for
different I/O managers. Some functions, for some I/O managers,
naturally have implementations that do nothing or barf. When only one
such I/O manager is enabled then the whole function implementation will
have an implementation that does nothing or barfs. This then results in
warnings from gcc that parameters are unused, or that the function
should be marked with attribute noreturn (since barf does not return).
The USED_IF_THREADS trick for fine-grained warning supression is fine
for just two cases, but an equivalent here would need
USED_IF_THE_ONLY_ENABLED_IOMGR_IS_X_OR_Y which would have combinitorial
blowup. So we take a coarse grained approach and simply disable these
two warnings for the whole file.

So we use a GCC pragma, with its handy push/pop support:

 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wsuggest-attribute=noreturn"
 #pragma GCC diagnostic ignored "-Wunused-parameter"

...

 #pragma GCC diagnostic pop

f70b8108

Move anyPendingTimeoutsOrIO impl from .h to .c · 60ce9910

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

The implementation is eventually going to need to use more private
things, which will drag in unwanted includes into IOManager.h, so it's
better to move the impl out of the header file and into the .c file, at
the slight cost of it no longer being inline.

At the same time, change to the "switch (iomgr_type)" style.

60ce9910

insertIntoSleepingQueue is no longer public · e93058e0

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

No longer defined in IOManager.h, just a private function in
IOManager.c. Since it is no longer called from cmm code, just from
syncDelay. It ought to get moved further into the select() I/O manager
impl, rather than living in IOManager.c.

On the other hand appendToIOBlockedQueue is still called from cmm code
in the win32-legacy I/O manager primops async{Read,Write}#, and it is
also used by the select() I/O manager. Update the CPP and comments to
reflect this.

e93058e0

Move most of the delay# impl from cmm to C · 457705a8

Duncan Coutts authored 2 years ago and

Marge Bot committed 11 months ago

Moves it into the IOManager.c where we can follow the new pattern of
switching on the selected I/O manager.

Uses a new IOManager API: syncDelay, following the naming convention of
sync* for thread-synchronous I/O & timer/delay operations.

As part of porting from cmm to C, we maintain the rule that the
why_blocked gets accessed using load acquire and store release atomic
memory operations. There was one exception to this rule: in the delay#
primop cmm code on posix (not win32), the why_blocked was being updated
using a store relaxed, not a store release. I've no idea why. In this
convesion I'm playing it safe here and using store release consistently.

457705a8