Commits · eb8c40e3c9e1df6f1f35d8f711cf58e13c629295 · gershomb / GHC

Jul 16, 2019
- rts: Rename the nondescript initProfiling2 to refreshProfilingCCSs · 52f755aa
  Daniel Gröber (dxld) authored 5 years ago and Marge Bot committed 5 years ago
  
  52f755aa
- rts: Divorce init of Heap profiler from CCS profiler · 373c9cb3
  Daniel Gröber (dxld) authored 5 years ago and Marge Bot committed 5 years ago
  
  Currently initProfiling gets defined by Profiling.c only if PROFILING is defined. Otherwise the ProfHeap.c defines it. This is just needlessly complicated so in this commit I make Profiling and ProfHeap into properly seperate modules and call their respective init functions from RtsStartup.c.
  373c9cb3
Jul 14, 2019

Expunge #ifdef and #ifndef from the codebase · d7c6c471

John Ericson authored 5 years ago and

Marge Bot committed 5 years ago

These are unexploded minds as far as the linter is concerned. I don't
want to hit in my MRs by mistake!

I did this with `sed`, and then rolled back some changes in the docs,
config.guess, and the linter itself.

d7c6c471

Jul 10, 2019

Remove most uses of TARGET platform macros · 0472f0f6

John Ericson authored 5 years ago and

Marge Bot committed 5 years ago

These prevent multi-target builds. They were gotten rid of in 3 ways:

1. In the compiler itself, replacing `#if` with runtime `if`. In these
cases, we care about the target platform still, but the target platform
is dynamic so we must delay the elimination to run time.

2. In the compiler itself, replacing `TARGET` with `HOST`. There was
just one bit of this, in some code splitting strings representing lists
of paths. These paths are used by GHC itself, and not by the compiled
binary. (They are compiler lookup paths, rather than RPATHS or something
that does matter to the compiled binary, and thus would legitamentally
be target-sensative.) As such, the path-splitting method only depends on
where GHC runs and not where code it produces runs. This should have
been `HOST` all along.

3. Changing the RTS. The RTS doesn't care about the target platform,
full stop.

4. `includes/stg/HaskellMachRegs.h` This file is also included in the
genapply executable. This is tricky because the RTS's host platform
really is that utility's target platform. so that utility really really
isn't multi-target either. But at least it isn't an installed part of
GHC, but just a one-off tool when building the RTS. Lying with the
`HOST` to a one-off program (genapply) that isn't installed doesn't seem so bad.
It's certainly better than the other way around of lying to the RTS
though not to genapply. The RTS is more important, and it is installed,
*and* this header is installed as part of the RTS.

0472f0f6

Jul 05, 2019

rts: Fix -hT option with profiling rts · ed662901

Daniel Gröber (dxld) authored 5 years ago and

Marge Bot committed 5 years ago

In dumpCensus we switch/case on doHeapProfile twice. The second switch
tries to barf on unknown doHeapProfile modes but HEAP_BY_CLOSURE_TYPE is
checked by the first switch and not included in the second.

So when trying to pass -hT to the profiling rts it barfs.

This commit simply merges the two switches into one which fixes this
problem.

ed662901

Make printer untag when chasing a pointer in a RET_FUN frame · d7f7e1ed
Siddharth authored 5 years ago and Marge Bot committed 5 years ago
```
This is to mimic what `Scav.c` does. This should fix a crash in
the printer.
```
d7f7e1ed

Jul 02, 2019
- Apply suggestion to rts/linker/Elf.c · 0bed9647
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  0bed9647
- Apply suggestion to rts/linker/elf_got.c · 023a2bc7
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  023a2bc7
- Lookup _GLOBAL_OFFSET_TABLE by symbol->addr when doing relocations · 348e3f8e
  Edward Amsden authored 5 years ago and Marge Bot committed 5 years ago
  
  348e3f8e
- Add _GLOBAL_OFFSET_TABLE_ support · 82693938
  Moritz Angermann authored 5 years ago and Marge Bot committed 5 years ago
  
  This adds lookup logic for _GLOBAL_OFFSET_TABLE_ as well as relocation logic for R_ARM_BASE_PREL and R_ARM_GOT_BREL which the gnu toolchain (gas, gcc, ...) prefers to produce. Apparently recent llvm toolchains will produce those as well.
  82693938
Jun 28, 2019

rts: Assert that LDV profiling isn't used with parallel GC · bd660ede
Ben Gamari authored 5 years ago
```
I'm not entirely sure we are careful about ensuring this; this is a
last-ditch check.
```
bd660ede

Correct closure observation, construction, and mutation on weak memory machines. · 11bac115

Travis Whitaker authored 5 years ago and

Ben Gamari committed 5 years ago


Here the following changes are introduced:
    - A read barrier machine op is added to Cmm.
    - The order in which a closure's fields are read and written is changed.
    - Memory barriers are added to RTS code to ensure correctness on
      out-or-order machines with weak memory ordering.

Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.

Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.

When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:

- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.

Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:

- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
  old closure fields!
- The updater writes to the closure's other fields, but its too late.

Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:

- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.

Observing a closure's fields must follow the following pattern:

- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.

This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.

Co-Authored-By: Ben Gamari <ben@well-typed.com>

11bac115

Fix GCC warnings with __clear_cache builtin (#16867) · 4ec233ec
Sylvain Henry authored 5 years ago and Marge Bot committed 5 years ago

4ec233ec

Jun 27, 2019

rts: Do not traverse nursery for dead closures in LDV profile · 07cffc49

Matthew Pickering authored 5 years ago and

Marge Bot committed 5 years ago

It is important that `heapCensus` and `LdvCensusForDead` traverse the
same areas.

`heapCensus` increases the `not_used` counter which tracks how many
closures are live but haven't been used yet.

`LdvCensusForDead` increases the `void_total` counter which tracks how
many dead closures there are.

The `LAG` is then calculated by substracting the `void_total` from
`not_used` and so it is essential that `not_used >= void_total`. This
fact is checked by quite a few assertions.

However, if a program has low maximum residency but allocates a lot in
the nursery then these assertions were failing (see #16753 and #15903)
because `LdvCensusForDead` was observing dead closures from the nursery
which totalled more than the `not_used`. The same closures were not
counted by `heapCensus`.

Therefore, it seems that the correct fix is to make `LdvCensusForDead`
agree with `heapCensus` and not traverse the nursery for dead closures.

Fixes #16100 #16753 #15903 #8982

07cffc49

rts: Correct assertion in LDV_recordDead · ed4cbd93
Matthew Pickering authored 5 years ago and Marge Bot committed 5 years ago
```
It is possible that void_total is exactly equal to not_used and the
other assertions for this check for <= rather than <.
```
ed4cbd93

rts: Correct handling of LARGE ARR_WORDS in LDV profiler · a586b33f

Matthew Pickering authored 5 years ago and

Marge Bot committed 5 years ago

This implements the correct fix for #11627 by skipping over the slop
(which is zeroed) rather than adding special case logic for LARGE
ARR_WORDS which runs the risk of not performing a correct census by
ignoring any subsequent blocks.

This approach implements similar logic to that in Sanity.c

a586b33f

Jun 26, 2019
- [skip ci] add a blurb about the purpose of Printer.c · e0899925
  Siddharth authored 5 years ago
  
  e0899925
Jun 22, 2019

rts: Reset STATIC_LINK field of reverted CAFs · b0d6bf2a

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

When we revert a CAF we must reset the STATIC_LINK field lest the GC
might ignore the CAF (e.g. as it carries the STATIC_FLAG_LIST flag) and
will consequently overlook references to object code that we are trying
to unload. This would result in the reachable object code being
unloaded. See Note [CAF lists] and Note [STATIC_LINK fields].

This fixes #16842.

Idea-due-to: Phuong Trinh <lolotp@fb.com>

b0d6bf2a

Jun 13, 2019

Fix #16525: ObjectCode freed wrongly because of lack of info header check · fc6b23be

Phuong Trinh authored 5 years ago and

Marge Bot committed 5 years ago

`checkUnload` currently doesn't check the info header of static objects.
Thus, it may free an `ObjectCode` struct wrongly even if there's still a
live static object whose info header lies in a mapped section of that
`ObjectCode`. This fixes the issue by adding an appropriate check.

fc6b23be

Jun 12, 2019
- rts/linker: Only mprotect GOT after it is filled · 217e6db4
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  This fixes a regression, introduced by 67c422ca, where we mprotect'd the global offset table (GOT) region to PROT_READ before we had finished filling it, resulting in a linker crash. Fixes #16779.
  217e6db4
- rts/linker: Make elf_got.c a bit more legible · bbc752c5
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  bbc752c5
Jun 11, 2019
- Fix an error message in CheckUnload.c:searchHeapBlocks · 1389b2cc
  Ömer Sinan Ağacan authored 5 years ago and Marge Bot committed 5 years ago
  
  1389b2cc
- rts/linker: Use mmapForLinker to map PLT · 0b7f81f5
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  The PLT needs to be located within a close distance of the code calling it under the small memory model. Fixes #16784.
  0b7f81f5
- rts/linker: Mmap into low memory on AArch64 · cf7f36ae
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  This extends mmapForLinker to use the same low-memory mapping strategy used on x86_64 on AArch64. See #16784.
  cf7f36ae
- rts/RtsFlags.c: mention that -prof too enables support for +RTS -l · 762098bf
  Alp Mestanogullari authored 5 years ago and Marge Bot committed 5 years ago
  
  762098bf
Jun 09, 2019

rts: Fix RetainerProfile early return with TREC_CHUNK · 8e60e3f0

Daniel Gröber (dxld) authored 5 years ago and

Marge Bot committed 5 years ago

When pop() returns with `*c == NULL` retainerProfile will immediately
return. All other code paths is pop() continue with the next stackElement
when this happens so it seems weird to me that TREC_CHUNK we would suddenly
abort everything even though the stack might still have elements left to
process.

8e60e3f0

rts: Separate population of eventTypes from initial event generation · 13572480

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

Previously these two orthogonal concerns were both implemented in
postHeaderEvents which made it difficult to send header events after RTS
initialization.

13572480

Jun 08, 2019
- Fix two lint failures in rts/linker/MachO.c · 310d0c4c
  Matthew Pickering authored 5 years ago and Ben Gamari committed 5 years ago
  
  310d0c4c
Jun 07, 2019

Add HEAP_PROF_SAMPLE_END event to mark end of samples · 0b7372f6

Matthew Pickering authored 5 years ago and

Marge Bot committed 5 years ago

This allows a user to observe how long a sampling period lasts so that
the time taken can be removed from the profiling output.

Fixes #16697

0b7372f6

Jun 01, 2019
- rts: Remove unused decls from CNF.h · 2e297b36
  Ömer Sinan Ağacan authored 5 years ago and Marge Bot committed 5 years ago
  
  2e297b36
May 31, 2019
- Remove unused RTS function 'unmark' · c70d039e
  Ömer Sinan Ağacan authored 5 years ago and Marge Bot committed 5 years ago
  
  c70d039e
- support small arrays and CONSTR_NOCAF in ghc-heap · 284cca51
  David Hewson authored 5 years ago and Marge Bot committed 5 years ago
  
  284cca51
May 30, 2019

Apply suggestion to rts/CheckUnload.c · 8e42e98e
Phuong Trinh authored 5 years ago and Marge Bot committed 5 years ago

8e42e98e
Apply suggestion to rts/CheckUnload.c · 42129180
Phuong Trinh authored 5 years ago and Marge Bot committed 5 years ago

42129180

Use binary search to speedup checkUnload · f81f3964

Phuong Trinh authored 5 years ago and

Marge Bot committed 5 years ago

We are iterating through all object code for each heap objects when
checking whether object code can be unloaded. For large projects in
GHCi, this can be very expensive due to the large number of object code
that needs to be loaded/unloaded. To speed it up, this arrangess all
mapped sections of unloaded object code in a sorted array and use binary
search to check if an address location fall on them.

f81f3964

rts: Handle zero-sized mappings in MachO linker · 4ad37a32

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

As noted in #16701, it is possible that we will find that an object has
no segments needing to be mapped. Previously this would result in mmap
being called for a zero-length mapping, which would fail. We now simply
skip the mmap call in this case; the rest of the logic just works.

4ad37a32

May 29, 2019
- CNF.c: Move debug functions behind ifdef · 4d51e0d8
  Ömer Sinan Ağacan authored 5 years ago and Marge Bot committed 5 years ago
  
  4d51e0d8
May 27, 2019

Fix padding of entries in .prof files · 95b79173

Jasper Van der Jeugt authored 5 years ago and

Marge Bot committed 5 years ago

When the number of entries of a cost centre reaches 11 digits, it takes
up the whole space reserved for it and the prof file ends up looking
like:

    ... no.        entries  %time %alloc   %time %alloc

        ...
    ... 120918     978250    0.0    0.0     0.0    0.0
    ... 118891          0    0.0    0.0    73.3   80.8
    ... 11890229702412351    8.9   13.5    73.3   80.8
    ... 118903  153799689    0.0    0.1     0.0    0.1
        ...

This results in tooling not being able to parse the .prof file.  I
realise we have the JSON output as well now, but still it'd be good to
fix this little weirdness.

Original bug report and full prof file can be seen here:
<https://github.com/jaspervdj/profiteur/issues/28>.

95b79173

May 25, 2019
- Add `keepCAFs` to RtsSymbols · 70c24471
  Moritz Angermann authored 5 years ago and Marge Bot committed 5 years ago
  
  70c24471
May 22, 2019

RTS: Fix restrictive cast · ecc9366a

Alec Theriault authored 5 years ago and

Marge Bot committed 5 years ago

Commit e75a9afd added an `unsigned` cast
to account for OSes that have signed `rlim_t` signed. Unfortunately,
the `unsigned` cast has the unintended effect of narrowing `rlim_t` to
only 4 bytes. This leads to some spurious out of memory crashes
(in particular: Haddock crashes with OOM whenn building docs of
`ghc`-the-library).

In this case, `W_` is a better type to cast to: we know it will be
unsigned too and it has the same type as `*len` (so we don't suffer from
accidental narrowing).

ecc9366a