1. 30 May, 2020 2 commits
    • Ben Gamari's avatar
      rts: Drop compatibility shims for Windows Vista · 3d960169
      Ben Gamari authored
      We can now assume that the thread and processor group interfaces are
      available.
      3d960169
    • Ben Gamari's avatar
      rts: Teach getNumProcessors to return available processors · 4413828b
      Ben Gamari authored
      Previously we would report the number of physical processors, which
      can be quite wrong in a containerized setting. Now we rather return how
      many processors are in our affinity mask when possible.
      
      I also refactored the code to prefer platform-specific since this will
      report logical CPUs instead of physical (using
      `machdep.cpu.thread_count` on Darwin and `cpuset_getaffinity` on FreeBSD).
      
      Fixes #14781.
      4413828b
  2. 29 May, 2020 1 commit
  3. 27 May, 2020 1 commit
    • Ben Gamari's avatar
      eventlog: Fix racy flushing · 04750304
      Ben Gamari authored
      Previously no attempt was made to avoid multiple threads writing their
      capability-local eventlog buffers to the eventlog writer simultaneously.
      This could result in multiple eventlog streams being interleaved. Fix
      this by documenting that the EventLogWriter's write() and flush()
      functions may be called reentrantly and fix the default writer to
      protect its FILE* by a mutex.
      
      Fixes #18210.
      04750304
  4. 21 May, 2020 1 commit
  5. 13 May, 2020 1 commit
  6. 10 May, 2020 2 commits
  7. 08 May, 2020 2 commits
  8. 06 May, 2020 4 commits
    • Ömer Sinan Ağacan's avatar
      ELF linker: increment curSymbol after filling in fields of current entry · a95e7fe0
      Ömer Sinan Ağacan authored
      The bug was introduced in a8b7cef4 which added a field to the
      `symbols` array elements and then updated this code incorrectly:
      
          - oc->symbols[curSymbol++] = nm;
          + oc->symbols[curSymbol++].name = nm;
          + oc->symbols[curSymbol].addr = symbol->addr;
      a95e7fe0
    • Ben Gamari's avatar
      nonmoving: Fix handling of dirty objects · b2d72c75
      Ben Gamari authored
      Previously we (incorrectly) relied on failed_to_evac to be "precise".
      That is, we expected it to only be true if *all* of an object's fields
      lived outside of the non-moving heap. However, does not match the
      behavior of failed_to_evac, which is true if *any* of the object's
      fields weren't promoted (meaning that some others *may* live in the
      non-moving heap).
      
      This is problematic as we skip the non-moving write barrier for dirty
      objects (which we can only safely do if *all* fields point outside of
      the non-moving heap).
      
      Clearly this arises due to a fundamental difference in the behavior
      expected of failed_to_evac in the moving and non-moving collector.
      e.g., in the moving collector it is always safe to conservatively say
      failed_to_evac=true whereas in the non-moving collector the safe value
      is false.
      
      This issue went unnoticed as I never wrote down the dirtiness
      invariant enforced by the non-moving collector. We now define this
      invariant as
      
          An object being marked as dirty implies that all of its fields are
          on the mark queue (or, equivalently, update remembered set).
      
      To maintain this invariant we teach nonmovingScavengeOne to push the
      fields of objects which we fail to evacuate to the update remembered
      set. This is a simple and reasonably cheap solution and avoids the
      complexity and fragility that other, more strict alternative invariants
      would require.
      
      All of this is described in a new Note, Note [Dirty flags in the
      non-moving collector] in NonMoving.c.
      b2d72c75
    • Ben Gamari's avatar
      nonmoving: Fix incorrect failed_to_evac value during deadlock gc · 740b3b8d
      Ben Gamari authored
      Previously we would incorrectly set the failed_to_evac flag if we
      evacuated a value due to a deadlock GC. This would cause us to mark more
      things as dirty than strictly necessary. It also turned up a nasty but
      which I will fix next.
      740b3b8d
    • Ben Gamari's avatar
      rts: Zero block flags with -DZ · 420b957d
      Ben Gamari authored
      Block flags are very useful for determining the state of a block.
      However, some block allocator users don't touch them, leading to
      misleading values. Ensure that we zero then when zero-on-gc is set. This
      is safe and makes the flags more useful during debugging.
      420b957d
  9. 03 May, 2020 1 commit
  10. 01 May, 2020 3 commits
  11. 26 Apr, 2020 1 commit
  12. 23 Apr, 2020 3 commits
  13. 18 Apr, 2020 1 commit
    • Sylvain Henry's avatar
      Modules (#13009) · 15312bbb
      Sylvain Henry authored
      * SysTools
      * Parser
      * GHC.Builtin
      * GHC.Iface.Recomp
      * Settings
      
      Update Haddock submodule
      
      Metric Decrease:
          Naperian
          parsing001
      15312bbb
  14. 15 Apr, 2020 10 commits
    • Daniel Gröber (dxld)'s avatar
      rts: ProfHeap: Fix wrong time in last heap profile sample · ec77b2f1
      Daniel Gröber (dxld) authored
      We've had this longstanding issue in the heap profiler, where the time of
      the last sample in the profile is sometimes way off causing the rendered
      graph to be quite useless for long runs.
      
      It seems to me the problem is that we use mut_user_time() for the last
      sample as opposed to getRTSStats(), which we use when calling heapProfile()
      in GC.c.
      
      The former is equivalent to getProcessCPUTime() but the latter does
      some additional stuff:
      
          getProcessCPUTime() - end_init_cpu - stats.gc_cpu_ns -
          stats.nonmoving_gc_cpu_ns
      
      So to fix this just use getRTSStats() in both places.
      ec77b2f1
    • Daniel Gröber (dxld)'s avatar
      rts: Assert LDV_recordDead is not called for inherently used closures · 19de2fb0
      Daniel Gröber (dxld) authored
      The comments make it clear LDV_recordDead should not be called for
      inhererently used closures, so add an assertion to codify this fact.
      19de2fb0
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      rts: Fix nomenclature in OVERWRITING_CLOSURE macros · e149dea9
      Daniel Gröber (dxld) authored
      The additional commentary introduced by commit 8916e64e ("Implement
      shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got
      this wrong. We set 'prim' to true in overwritingClosureOfs because we
      _don't_ want to call LDV_recordDead().
      
      The reason is because of this "inherently used" distinction made in the LDV
      profiler so I rename the variable to be more appropriate.
      e149dea9
    • Daniel Gröber (dxld)'s avatar
      c3c0f662
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      15fa9bd6
    • Daniel Gröber (dxld)'s avatar
      Zero out pinned block alignment slop when profiling · 41230e26
      Daniel Gröber (dxld) authored
      The heap profiler currently cannot traverse pinned blocks because of
      alignment slop. This used to just be a minor annoyance as the whole block
      is accounted into a special cost center rather than the respective object's
      CCS, cf. #7275. However for the new root profiler we would like to be able
      to visit _every_ closure on the heap. We need to do this so we can get rid
      of the current 'flip' bit hack in the heap traversal code.
      
      Since info pointers are always non-zero we can in principle skip all the
      slop in the profiler if we can rely on it being zeroed. This assumption
      caused problems in the past though, commit a586b33f ("rts: Correct
      handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
      the same trick for BF_LARGE objects but neglected to take into account that
      shrink*Array# functions don't ensure that slop is zeroed when not
      compiling with profiling.
      
      Later, commit 0c114c65 ("Handle large ARR_WORDS in heap census (fix
      as we will only be assuming slop is zeroed when profiling is on.
      
      This commit also reduces the ammount of slop we introduce in the first
      place by calculating the needed alignment before doing the allocation for
      small objects where we know the next available address. For large objects
      we don't know how much alignment we'll have to do yet since those details
      are hidden behind the allocateMightFail function so there we continue to
      allocate the maximum additional words we'll need to do the alignment.
      
      So we don't have to duplicate all this logic in the cmm code we pull it
      into the RTS allocatePinned function instead.
      
      Metric Decrease:
          T7257
          haddock.Cabal
          haddock.base
      41230e26
    • Ben Gamari's avatar
      rts: Don't mark evacuate_large as inline · 27cc2e7b
      Ben Gamari authored
      This function has two callsites and is quite large. GCC consequently
      decides not to inline and warns instead. Given the situation, I can't
      blame it. Let's just remove the inline specifier.
      27cc2e7b
    • Ben Gamari's avatar
      StgCRun: Enable unwinding only on Linux · 5b08e0c0
      Ben Gamari authored
      It's broken on macOS due and SmartOS due to assembler differences
      (#15207) so let's be conservative in enabling it. Also, refactor things
      to make the intent clearer.
      5b08e0c0
  15. 14 Apr, 2020 1 commit
  16. 09 Apr, 2020 2 commits
    • Sylvain Henry's avatar
      Rts: show errno on failure (#18033) · dce50062
      Sylvain Henry authored
      dce50062
    • Ömer Sinan Ağacan's avatar
      Fix CNF handling in compacting GC · 39075176
      Ömer Sinan Ağacan authored
      Fixes #17937
      
      Previously compacting GC simply ignored CNFs. This is mostly fine as
      most (see "What about small compacts?" below) CNF objects don't have
      outgoing pointers, and are "large" (allocated in large blocks) and large
      objects are not moved or compacted.
      
      However if we do GC *during* sharing-preserving compaction then the CNF
      will have a hash table mapping objects that have been moved to the CNF
      to their location in the CNF, to be able to preserve sharing.
      
      This case is handled in the copying collector, in `scavenge_compact`,
      where we evacuate hash table entries and then rehash the table.
      
      Compacting GC ignored this case.
      
      We now visit CNFs in all generations when threading pointers to the
      compacted heap and thread hash table keys. A visited CNF is added to the
      list `nfdata_chain`. After compaction is done, we re-visit the CNFs in
      that list and rehash the tables.
      
      The overhead is minimal: the list is static in `Compact.c`, and link
      field is added to `StgCompactNFData` closure. Programs that don't use
      CNFs should not be affected.
      
      To test this CNF tests are now also run in a new way 'compacting_gc',
      which just passes `-c` to the RTS, enabling compacting GC for the oldest
      generation. Before this patch the result would be:
      
          Unexpected failures:
             compact_gc.run          compact_gc [bad exit code (139)] (compacting_gc)
             compact_huge_array.run  compact_huge_array [bad exit code (1)] (compacting_gc)
      
      With this patch all tests pass. I can also pass `-c -DS` without any
      failures.
      
      What about small compacts? Small CNFs are still not handled by the
      compacting GC. However so far I'm unable to write a test that triggers a
      runtime panic ("update_fwd: unknown/strange object") by allocating a
      small CNF in a compated heap. It's possible that I'm missing something
      and it's not possible to have a small CNF.
      
      NoFib Results:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          +0.1%      0.0%      0.0%     +0.0%     +0.0%
                  CSD          +0.1%      0.0%      0.0%      0.0%      0.0%
                   FS          +0.1%      0.0%      0.0%      0.0%      0.0%
                    S          +0.1%      0.0%      0.0%      0.0%      0.0%
                   VS          +0.1%      0.0%      0.0%      0.0%      0.0%
                  VSD          +0.1%      0.0%     +0.0%     +0.0%     -0.0%
                  VSM          +0.1%      0.0%     +0.0%     -0.0%      0.0%
                 anna          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 atom          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               awards          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               banner          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
           bernouilli          +0.1%      0.0%      0.0%     -0.0%     +0.0%
         binary-trees          +0.1%      0.0%     -0.0%     -0.0%      0.0%
                boyer          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               boyer2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 bspt          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             cichelli          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              circsim          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             clausify          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
        comp_lab_zift          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             compress          +0.1%      0.0%     +0.0%     +0.0%      0.0%
            compress2          +0.1%      0.0%     -0.0%      0.0%      0.0%
          constraints          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         cryptarithm1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         cryptarithm2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  cse          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         digits-of-e1          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
         digits-of-e2          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                eliza          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                event          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
          exact-reals          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               exp3_8          +0.1%      0.0%     +0.0%     -0.0%      0.0%
               expert          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
       fannkuch-redux          +0.1%      0.0%     -0.0%      0.0%      0.0%
                fasta          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                  fem          +0.1%      0.0%     -0.0%     +0.0%      0.0%
                  fft          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                 fft2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             fibheaps          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 fish          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                fluid          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               fulsom          +0.1%      0.0%     -0.0%     +0.0%      0.0%
               gamteb          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                  gcd          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
          gen_regexps          +0.1%      0.0%     -0.0%     +0.0%      0.0%
               genfft          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                   gg          +0.1%      0.0%      0.0%     +0.0%     +0.0%
                 grep          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
               hidden          +0.1%      0.0%     +0.0%     -0.0%      0.0%
                  hpg          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                infer          +0.1%      0.0%     +0.0%      0.0%     -0.0%
              integer          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
            integrate          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                kahan          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              knights          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          +0.1%      0.0%     +0.0%     +0.0%     -0.0%
           last-piece          +0.1%      0.0%     +0.0%      0.0%      0.0%
                 lcss          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                 life          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                 lift          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               linear          +0.1%      0.0%     -0.0%     +0.0%      0.0%
            listcompr          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             listcopy          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             maillist          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
               mandel          +0.1%      0.0%     +0.0%     +0.0%      0.0%
              mandel2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 mate          +0.1%      0.0%     +0.0%      0.0%     +0.0%
              minimax          +0.1%      0.0%     -0.0%      0.0%     -0.0%
              mkhprog          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
           multiplier          +0.1%      0.0%     +0.0%      0.0%      0.0%
               n-body          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             nucleic2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 para          +0.1%      0.0%      0.0%     +0.0%     +0.0%
            paraffins          +0.1%      0.0%     +0.0%     -0.0%      0.0%
               parser          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  pic          +0.1%      0.0%     -0.0%     -0.0%      0.0%
             pidigits          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
                power          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               pretty          +0.1%      0.0%     -0.0%     -0.0%     -0.1%
               primes          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               queens          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              reptile          +0.1%      0.0%     -0.0%     -0.0%     +0.0%
      reverse-complem          +0.1%      0.0%     +0.0%      0.0%     -0.0%
              rewrite          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  rsa          +0.1%      0.0%     -0.0%     +0.0%     -0.0%
                  scc          +0.1%      0.0%     -0.0%     -0.0%     -0.1%
                sched          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  scs          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               simple          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              sorting          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               sphere          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
            transform          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             treejoin          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
            typecheck          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              veritas          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                 wang          +0.1%      0.0%      0.0%     +0.0%     +0.0%
            wave4main          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         wheel-sieve1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         wheel-sieve2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 x2n1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
      --------------------------------------------------------------------------------
                  Min          +0.0%      0.0%     -0.0%     -0.0%     -0.1%
                  Max          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
       Geometric Mean          +0.1%     -0.0%     -0.0%     -0.0%     -0.0%
      
      Bumping numbers of nonsensical perf tests:
      
      Metric Increase:
          T12150
          T12234
          T12425
          T13035
          T5837
          T6048
      
      It's simply not possible for this patch to increase allocations, and
      I've wasted enough time on these test in the past (see #17686). I think
      these tests should not be perf tests, but for now I'll bump the numbers.
      39075176
  17. 07 Apr, 2020 1 commit
    • Daniel Gröber (dxld)'s avatar
      rts: ProfHeap: Fix memory leak when not compiled with profiling · f38e8d61
      Daniel Gröber (dxld) authored
      If we're doing heap profiling on an unprofiled executable we keep
      allocating new space in initEra via nextEra on each profiler run but we
      don't have a corresponding freeEra call.
      
      We do free the last era in endHeapProfiling but previous eras will have
      been overwritten by initEra and will never get free()ed.
      
      Metric Decrease:
          space_leak_001
      f38e8d61
  18. 03 Apr, 2020 1 commit
    • Andreas Klebinger's avatar
      Improve and refactor StgToCmm codegen for DataCons. · 9462452a
      Andreas Klebinger authored
      We now differentiate three cases of constructor bindings:
      
      1)Bindings which we can "replace" with a reference to
        an existing closure. Reference the replacement closure
        when accessing the binding.
      2)Bindings which we can "replace" as above. But we still
        generate a closure which will be referenced by modules
        importing this binding.
      3)For any other binding generate a closure. Then reference
        it.
      
      Before this patch 1) did only apply to local bindings and we
      didn't do 2) at all.
      9462452a
  19. 02 Apr, 2020 2 commits