1. 09 Aug, 2021 1 commit
  2. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  3. 29 Sep, 2014 1 commit
  4. 28 Jul, 2014 1 commit
  5. 20 Jul, 2010 1 commit
  6. 01 Apr, 2010 1 commit
    • Simon Marlow's avatar
      Change the representation of the MVar blocked queue · f4692220
      Simon Marlow authored
      The list of threads blocked on an MVar is now represented as a list of
      separately allocated objects rather than being linked through the TSOs
      themselves.  This lets us remove a TSO from the list in O(1) time
      rather than O(n) time, by marking the list object.  Removing this
      linear component fixes some pathalogical performance cases where many
      threads were blocked on an MVar and became unreachable simultaneously
      (nofib/smp/threads007), or when sending an asynchronous exception to a
      TSO in a long list of thread blocked on an MVar.
      
      MVar performance has actually improved by a few percent as a result of
      this change, slightly to my surprise.
      
      This is the final cleanup in the sequence, which let me remove the old
      way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the
      new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent).  It
      is now the case that only the Capability that owns a TSO may modify
      its state (well, almost), and this simplifies various things.  More of
      the RTS is based on message-passing between Capabilities now.
      f4692220
  7. 08 Oct, 2009 1 commit
  8. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  9. 19 Jun, 2008 1 commit
    • Simon Marlow's avatar
      Fix up inlines for gcc 4.3 · 24ad9cf0
      Simon Marlow authored
      gcc 4.3 emits warnings for static inline functions that its heuristics
      decided not to inline.  The workaround is to either mark appropriate
      functions as "hot" (a new attribute in gcc 4.3), or sometimes to use
      "extern inline" instead.
      
      With this fix I can validate with gcc 4.3 on Fedora 9.
      24ad9cf0