rts/Messages.h · cf989ffe490c146be4ed0fd7e0c00d3ff8fe1453 · Glasgow Haskell Compiler / GHC

Simon Marlow authored Apr 23, 2016

Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.

Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node

When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory

For example, using nofib/parallel/queens on a 24-core 2-socket machine:

```
$ ./Main 15 +RTS -N24 -s -A64m
  Total   time  173.960s  (  7.467s elapsed)

$ ./Main 15 +RTS -N24 -s -A64m --numa
  Total   time  150.836s  (  6.423s elapsed)
```

The biggest win here is expected to be allocating from node-local
memory, so th...

9e5ea67e