Commit 9e5ea67e authored by Simon Marlow's avatar Simon Marlow

NUMA support

Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.

Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node

When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory

For example, using nofib/parallel/queens on a 24-core 2-socket machine:

```
$ ./Main 15 +RTS -N24 -s -A64m
  Total   time  173.960s  (  7.467s elapsed)

$ ./Main 15 +RTS -N24 -s -A64m --numa
  Total   time  150.836s  (  6.423s elapsed)
```

The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).

According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.

Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
  actually making the OS calls, which is useful for testing the code
  on non-NUMA systems.
* TODO: I need to add some unit tests

Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2199
parent b9fa72a2
...@@ -1103,6 +1103,13 @@ if test $UseLibdw = "YES" ; then ...@@ -1103,6 +1103,13 @@ if test $UseLibdw = "YES" ; then
fi fi
AC_DEFINE_UNQUOTED([USE_LIBDW], [$USE_LIBDW], [Set to 1 to use libdw]) AC_DEFINE_UNQUOTED([USE_LIBDW], [$USE_LIBDW], [Set to 1 to use libdw])
dnl ** Have libnuma?
dnl --------------------------------------------------------------
AC_CHECK_HEADERS([numa.h numaif.h])
AC_CHECK_LIB(numa, numa_available,
[AC_DEFINE([HAVE_LIBNUMA], [1], [Define to 1 if you have libnuma.])]
[])
dnl ** Documentation dnl ** Documentation
dnl -------------------------------------------------------------- dnl --------------------------------------------------------------
if test -n "$SPHINXBUILD"; then if test -n "$SPHINXBUILD"; then
......
...@@ -643,6 +643,56 @@ performance. ...@@ -643,6 +643,56 @@ performance.
``-F`` parameter will be reduced in order to avoid exceeding the ``-F`` parameter will be reduced in order to avoid exceeding the
maximum heap size. maximum heap size.
.. rts-flag:: --numa
--numa=<mask>
.. index::
single: NUMA, enabling in the runtime
Enable NUMA-aware memory allocation in the runtime (only available
with ``-threaded``, and only on Linux currently).
Background: some systems have a Non-Uniform Memory Architecture,
whereby main memory is split into banks which are "local" to
specific CPU cores. Accessing local memory is faster than
accessing remote memory. The OS provides APIs for allocating
local memory and binding threads to particular CPU cores, so that
we can ensure certain memory accesses are using local memory.
The ``--numa`` option tells the RTS to tune its memory usage to
maximize local memory accesses. In particular, the RTS will:
- Determine the number of NUMA nodes (N) by querying the OS.
- Manage separate memory pools for each node.
- Map capabilities to NUMA nodes. Capability C is mapped to
NUMA node C mod N.
- Bind worker threads on a capability to the appropriate node.
- Allocate the nursery from node-local memory.
- Perform other memory allocation, including in the GC, from
node-local memory.
- When load-balancing, we prefer to migrate threads to another
Capability on the same node.
The ``--numa`` flag is typically beneficial when a program is
using all cores of a large multi-core NUMA system, with a large
allocation area (``-A``). All memory accesses to the allocation
area will go to local memory, which can save a significant amount
of remote memory access. A runtime speedup on the order of 10%
is typical, but can vary a lot depending on the hardware and the
memory behaviour of the program.
Note that the RTS will not set CPU affinity for bound threads and
threads entering Haskell from C/C++, so if your program uses bound
threads you should ensure that each bound thread calls the RTS API
`rts_setInCallCapability(c,1)` from C/C++ before calling into
Haskell. Otherwise there could be a mismatch between the CPU that
the thread is running on and the memory it is using while running
Haskell code, which will negate any benefits of ``--numa``.
If given an explicit <mask>, the <mask> is interpreted as a bitmap
that indicates the NUMA nodes on which to run the program. For
example, ``--numa=3`` would run the program on NUMA nodes 0 and 1.
.. _rts-options-statistics: .. _rts-options-statistics:
RTS options to produce runtime statistics RTS options to produce runtime statistics
......
...@@ -325,7 +325,6 @@ ...@@ -325,7 +325,6 @@
#include "DerivedConstants.h" #include "DerivedConstants.h"
#include "rts/storage/ClosureTypes.h" #include "rts/storage/ClosureTypes.h"
#include "rts/storage/FunTypes.h" #include "rts/storage/FunTypes.h"
#include "rts/storage/SMPClosureOps.h"
#include "rts/OSThreads.h" #include "rts/OSThreads.h"
/* /*
......
...@@ -203,7 +203,6 @@ INLINE_HEADER Time fsecondsToTime (double t) ...@@ -203,7 +203,6 @@ INLINE_HEADER Time fsecondsToTime (double t)
#include "rts/storage/ClosureTypes.h" #include "rts/storage/ClosureTypes.h"
#include "rts/storage/TSO.h" #include "rts/storage/TSO.h"
#include "stg/MiscClosures.h" /* InfoTables, closures etc. defined in the RTS */ #include "stg/MiscClosures.h" /* InfoTables, closures etc. defined in the RTS */
#include "rts/storage/SMPClosureOps.h"
#include "rts/storage/Block.h" #include "rts/storage/Block.h"
#include "rts/storage/ClosureMacros.h" #include "rts/storage/ClosureMacros.h"
#include "rts/storage/MBlock.h" #include "rts/storage/MBlock.h"
......
...@@ -179,7 +179,11 @@ Capability *rts_unsafeGetMyCapability (void); ...@@ -179,7 +179,11 @@ Capability *rts_unsafeGetMyCapability (void);
// Note that the thread may still be migrated by the RTS scheduler, but that // Note that the thread may still be migrated by the RTS scheduler, but that
// will only happen if there are multiple threads running on one Capability and // will only happen if there are multiple threads running on one Capability and
// another Capability is free. // another Capability is free.
void setInCallCapability (int preferred_capability); //
// If affinity is non-zero, the current thread will be bound to
// specific CPUs according to the prevailing affinity policy for the
// specified capability, set by either +RTS -qa or +RTS --numa.
void rts_setInCallCapability (int preferred_capability, int affinity);
/* ---------------------------------------------------------------------------- /* ----------------------------------------------------------------------------
Building Haskell objects from C datatypes. Building Haskell objects from C datatypes.
......
...@@ -295,4 +295,10 @@ ...@@ -295,4 +295,10 @@
#define MAX_SPARE_WORKERS 6 #define MAX_SPARE_WORKERS 6
/*
* The maximum number of NUMA nodes we support. This is a fixed limit so that
* we can have static arrays of this size in the RTS for speed.
*/
#define MAX_NUMA_NODES 16
#endif /* RTS_CONSTANTS_H */ #endif /* RTS_CONSTANTS_H */
...@@ -73,6 +73,11 @@ typedef struct _GC_FLAGS { ...@@ -73,6 +73,11 @@ typedef struct _GC_FLAGS {
* to handle the exception before we * to handle the exception before we
* raise it again. * raise it again.
*/ */
rtsBool numa; /* Use NUMA */
uint32_t nNumaNodes; /* Number of nodes */
uint32_t numaMap[MAX_NUMA_NODES]; /* Map our internal node numbers to OS
* node numbers */
} GC_FLAGS; } GC_FLAGS;
/* See Note [Synchronization of flags and base APIs] */ /* See Note [Synchronization of flags and base APIs] */
...@@ -93,6 +98,7 @@ typedef struct _DEBUG_FLAGS { ...@@ -93,6 +98,7 @@ typedef struct _DEBUG_FLAGS {
rtsBool squeeze; /* 'z' stack squeezing & lazy blackholing */ rtsBool squeeze; /* 'z' stack squeezing & lazy blackholing */
rtsBool hpc; /* 'c' coverage */ rtsBool hpc; /* 'c' coverage */
rtsBool sparks; /* 'r' */ rtsBool sparks; /* 'r' */
rtsBool numa; /* '--debug-numa' */
} DEBUG_FLAGS; } DEBUG_FLAGS;
/* See Note [Synchronization of flags and base APIs] */ /* See Note [Synchronization of flags and base APIs] */
...@@ -184,7 +190,7 @@ typedef struct _MISC_FLAGS { ...@@ -184,7 +190,7 @@ typedef struct _MISC_FLAGS {
#ifdef THREADED_RTS #ifdef THREADED_RTS
/* See Note [Synchronization of flags and base APIs] */ /* See Note [Synchronization of flags and base APIs] */
typedef struct _PAR_FLAGS { typedef struct _PAR_FLAGS {
uint32_t nNodes; /* number of threads to run simultaneously */ uint32_t nCapabilities; /* number of threads to run simultaneously */
rtsBool migrate; /* migrate threads between capabilities */ rtsBool migrate; /* migrate threads between capabilities */
uint32_t maxLocalSparks; uint32_t maxLocalSparks;
rtsBool parGcEnabled; /* enable parallel GC */ rtsBool parGcEnabled; /* enable parallel GC */
......
...@@ -200,7 +200,9 @@ void setThreadLocalVar (ThreadLocalKey *key, void *value); ...@@ -200,7 +200,9 @@ void setThreadLocalVar (ThreadLocalKey *key, void *value);
void freeThreadLocalKey (ThreadLocalKey *key); void freeThreadLocalKey (ThreadLocalKey *key);
// Processors and affinity // Processors and affinity
void setThreadAffinity (uint32_t n, uint32_t m); void setThreadAffinity (uint32_t n, uint32_t m);
void setThreadNode (uint32_t node);
void releaseThreadNode (void);
#endif // !CMINUSMINUS #endif // !CMINUSMINUS
#else #else
......
...@@ -58,7 +58,9 @@ pid_t forkProcess (HsStablePtr *entry) ...@@ -58,7 +58,9 @@ pid_t forkProcess (HsStablePtr *entry)
HsBool rtsSupportsBoundThreads (void); HsBool rtsSupportsBoundThreads (void);
// The number of Capabilities // The number of Capabilities.
// ToDo: I would like this to be private to the RTS and instead expose a
// function getNumCapabilities(), but it is used in compiler/cbits/genSym.c
extern unsigned int n_capabilities; extern unsigned int n_capabilities;
// The number of Capabilities that are not disabled // The number of Capabilities that are not disabled
......
...@@ -111,7 +111,7 @@ typedef struct bdescr_ { ...@@ -111,7 +111,7 @@ typedef struct bdescr_ {
StgWord16 gen_no; // gen->no, cached StgWord16 gen_no; // gen->no, cached
StgWord16 dest_no; // number of destination generation StgWord16 dest_no; // number of destination generation
StgWord16 _pad1; StgWord16 node; // which memory node does this block live on?
StgWord16 flags; // block flags, see below StgWord16 flags; // block flags, see below
...@@ -280,12 +280,28 @@ extern void initBlockAllocator(void); ...@@ -280,12 +280,28 @@ extern void initBlockAllocator(void);
/* Allocation -------------------------------------------------------------- */ /* Allocation -------------------------------------------------------------- */
bdescr *allocGroup(W_ n); bdescr *allocGroup(W_ n);
bdescr *allocBlock(void);
EXTERN_INLINE bdescr* allocBlock(void);
EXTERN_INLINE bdescr* allocBlock(void)
{
return allocGroup(1);
}
bdescr *allocGroupOnNode(uint32_t node, W_ n);
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node);
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node)
{
return allocGroupOnNode(node,1);
}
// versions that take the storage manager lock for you: // versions that take the storage manager lock for you:
bdescr *allocGroup_lock(W_ n); bdescr *allocGroup_lock(W_ n);
bdescr *allocBlock_lock(void); bdescr *allocBlock_lock(void);
bdescr *allocGroupOnNode_lock(uint32_t node, W_ n);
bdescr *allocBlockOnNode_lock(uint32_t node);
/* De-Allocation ----------------------------------------------------------- */ /* De-Allocation ----------------------------------------------------------- */
void freeGroup(bdescr *p); void freeGroup(bdescr *p);
......
...@@ -18,6 +18,8 @@ extern W_ mblocks_allocated; ...@@ -18,6 +18,8 @@ extern W_ mblocks_allocated;
extern void initMBlocks(void); extern void initMBlocks(void);
extern void * getMBlock(void); extern void * getMBlock(void);
extern void * getMBlocks(uint32_t n); extern void * getMBlocks(uint32_t n);
extern void * getMBlockOnNode(uint32_t node);
extern void * getMBlocksOnNode(uint32_t node, uint32_t n);
extern void freeMBlocks(void *addr, uint32_t n); extern void freeMBlocks(void *addr, uint32_t n);
extern void releaseFreeMemory(void); extern void releaseFreeMemory(void);
extern void freeAllMBlocks(void); extern void freeAllMBlocks(void);
......
...@@ -51,7 +51,7 @@ Capability **capabilities = NULL; ...@@ -51,7 +51,7 @@ Capability **capabilities = NULL;
// an in-call has a chance of quickly finding a free Capability. // an in-call has a chance of quickly finding a free Capability.
// Maintaining a global free list of Capabilities would require global // Maintaining a global free list of Capabilities would require global
// locking, so we don't do that. // locking, so we don't do that.
static Capability *last_free_capability = NULL; static Capability *last_free_capability[MAX_NUMA_NODES];
/* /*
* Indicates that the RTS wants to synchronise all the Capabilities * Indicates that the RTS wants to synchronise all the Capabilities
...@@ -230,11 +230,12 @@ popReturningTask (Capability *cap) ...@@ -230,11 +230,12 @@ popReturningTask (Capability *cap)
* ------------------------------------------------------------------------- */ * ------------------------------------------------------------------------- */
static void static void
initCapability( Capability *cap, uint32_t i ) initCapability (Capability *cap, uint32_t i)
{ {
uint32_t g; uint32_t g;
cap->no = i; cap->no = i;
cap->node = capNoToNumaNode(i);
cap->in_haskell = rtsFalse; cap->in_haskell = rtsFalse;
cap->idle = 0; cap->idle = 0;
cap->disabled = rtsFalse; cap->disabled = rtsFalse;
...@@ -316,9 +317,10 @@ initCapability( Capability *cap, uint32_t i ) ...@@ -316,9 +317,10 @@ initCapability( Capability *cap, uint32_t i )
* controlled by the user via the RTS flag -N. * controlled by the user via the RTS flag -N.
* *
* ------------------------------------------------------------------------- */ * ------------------------------------------------------------------------- */
void void initCapabilities (void)
initCapabilities( void )
{ {
uint32_t i;
/* Declare a couple capability sets representing the process and /* Declare a couple capability sets representing the process and
clock domain. Each capability will get added to these capsets. */ clock domain. Each capability will get added to these capsets. */
traceCapsetCreate(CAPSET_OSPROCESS_DEFAULT, CapsetTypeOsProcess); traceCapsetCreate(CAPSET_OSPROCESS_DEFAULT, CapsetTypeOsProcess);
...@@ -328,21 +330,22 @@ initCapabilities( void ) ...@@ -328,21 +330,22 @@ initCapabilities( void )
#ifndef REG_Base #ifndef REG_Base
// We can't support multiple CPUs if BaseReg is not a register // We can't support multiple CPUs if BaseReg is not a register
if (RtsFlags.ParFlags.nNodes > 1) { if (RtsFlags.ParFlags.nCapabilities > 1) {
errorBelch("warning: multiple CPUs not supported in this build, reverting to 1"); errorBelch("warning: multiple CPUs not supported in this build, reverting to 1");
RtsFlags.ParFlags.nNodes = 1; RtsFlags.ParFlags.nCapabilities = 1;
} }
#endif #endif
n_capabilities = 0; n_capabilities = 0;
moreCapabilities(0, RtsFlags.ParFlags.nNodes); moreCapabilities(0, RtsFlags.ParFlags.nCapabilities);
n_capabilities = RtsFlags.ParFlags.nNodes; n_capabilities = RtsFlags.ParFlags.nCapabilities;
#else /* !THREADED_RTS */ #else /* !THREADED_RTS */
n_capabilities = 1; n_capabilities = 1;
capabilities = stgMallocBytes(sizeof(Capability*), "initCapabilities"); capabilities = stgMallocBytes(sizeof(Capability*), "initCapabilities");
capabilities[0] = &MainCapability; capabilities[0] = &MainCapability;
initCapability(&MainCapability, 0); initCapability(&MainCapability, 0);
#endif #endif
...@@ -352,7 +355,9 @@ initCapabilities( void ) ...@@ -352,7 +355,9 @@ initCapabilities( void )
// There are no free capabilities to begin with. We will start // There are no free capabilities to begin with. We will start
// a worker Task to each Capability, which will quickly put the // a worker Task to each Capability, which will quickly put the
// Capability on the free list when it finds nothing to do. // Capability on the free list when it finds nothing to do.
last_free_capability = capabilities[0]; for (i = 0; i < RtsFlags.GcFlags.nNumaNodes; i++) {
last_free_capability[i] = capabilities[0];
}
} }
void void
...@@ -532,7 +537,7 @@ releaseCapability_ (Capability* cap, ...@@ -532,7 +537,7 @@ releaseCapability_ (Capability* cap,
#ifdef PROFILING #ifdef PROFILING
cap->r.rCCCS = CCS_IDLE; cap->r.rCCCS = CCS_IDLE;
#endif #endif
last_free_capability = cap; last_free_capability[cap->node] = cap;
debugTrace(DEBUG_sched, "freeing capability %d", cap->no); debugTrace(DEBUG_sched, "freeing capability %d", cap->no);
} }
...@@ -711,6 +716,7 @@ void waitForCapability (Capability **pCap, Task *task) ...@@ -711,6 +716,7 @@ void waitForCapability (Capability **pCap, Task *task)
*pCap = &MainCapability; *pCap = &MainCapability;
#else #else
uint32_t i;
Capability *cap = *pCap; Capability *cap = *pCap;
if (cap == NULL) { if (cap == NULL) {
...@@ -719,12 +725,14 @@ void waitForCapability (Capability **pCap, Task *task) ...@@ -719,12 +725,14 @@ void waitForCapability (Capability **pCap, Task *task)
enabled_capabilities]; enabled_capabilities];
} else { } else {
// Try last_free_capability first // Try last_free_capability first
cap = last_free_capability; cap = last_free_capability[task->node];
if (cap->running_task) { if (cap->running_task) {
uint32_t i; // Otherwise, search for a free capability on this node.
// otherwise, search for a free capability
cap = NULL; cap = NULL;
for (i = 0; i < n_capabilities; i++) { for (i = task->node; i < enabled_capabilities;
i += RtsFlags.GcFlags.nNumaNodes) {
// visits all the capabilities on this node, because
// cap[i]->node == i % RtsFlags.GcFlags.nNumaNodes
if (!capabilities[i]->running_task) { if (!capabilities[i]->running_task) {
cap = capabilities[i]; cap = capabilities[i];
break; break;
...@@ -732,7 +740,7 @@ void waitForCapability (Capability **pCap, Task *task) ...@@ -732,7 +740,7 @@ void waitForCapability (Capability **pCap, Task *task)
} }
if (cap == NULL) { if (cap == NULL) {
// Can't find a free one, use last_free_capability. // Can't find a free one, use last_free_capability.
cap = last_free_capability; cap = last_free_capability[task->node];
} }
} }
} }
......
...@@ -36,6 +36,15 @@ struct Capability_ { ...@@ -36,6 +36,15 @@ struct Capability_ {
uint32_t no; // capability number. uint32_t no; // capability number.
// The NUMA node on which this capability resides. This is used to allocate
// node-local memory in allocate().
//
// Note: this is always equal to cap->no % RtsFlags.ParFlags.nNumaNodes.
// The reason we slice it this way is that if we add or remove capabilities
// via setNumCapabilities(), then we keep the number of capabilities on each
// NUMA node balanced.
uint32_t node;
// The Task currently holding this Capability. This task has // The Task currently holding this Capability. This task has
// exclusive access to the contents of this Capability (apart from // exclusive access to the contents of this Capability (apart from
// returning_tasks_hd/returning_tasks_tl). // returning_tasks_hd/returning_tasks_tl).
...@@ -151,6 +160,8 @@ struct Capability_ { ...@@ -151,6 +160,8 @@ struct Capability_ {
; ;
#define capNoToNumaNode(n) ((n) % RtsFlags.GcFlags.nNumaNodes)
#if defined(THREADED_RTS) #if defined(THREADED_RTS)
#define ASSERT_TASK_ID(task) ASSERT(task->id == osThreadId()) #define ASSERT_TASK_ID(task) ASSERT(task->id == osThreadId())
#else #else
...@@ -221,7 +232,6 @@ INLINE_HEADER void releaseCapability_ (Capability* cap STG_UNUSED, ...@@ -221,7 +232,6 @@ INLINE_HEADER void releaseCapability_ (Capability* cap STG_UNUSED,
// extern uint32_t enabled_capabilities; // extern uint32_t enabled_capabilities;
// Array of all the capabilities // Array of all the capabilities
//
extern Capability **capabilities; extern Capability **capabilities;
// //
...@@ -364,7 +374,7 @@ recordMutableCap (const StgClosure *p, Capability *cap, uint32_t gen) ...@@ -364,7 +374,7 @@ recordMutableCap (const StgClosure *p, Capability *cap, uint32_t gen)
bd = cap->mut_lists[gen]; bd = cap->mut_lists[gen];
if (bd->free >= bd->start + BLOCK_SIZE_W) { if (bd->free >= bd->start + BLOCK_SIZE_W) {
bdescr *new_bd; bdescr *new_bd;
new_bd = allocBlock_lock(); new_bd = allocBlockOnNode_lock(cap->node);
new_bd->link = bd; new_bd->link = bd;
bd = new_bd; bd = new_bd;
cap->mut_lists[gen] = bd; cap->mut_lists[gen] = bd;
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#include "Cmm.h" #include "Cmm.h"
#include "Updates.h" #include "Updates.h"
#include "SMPClosureOps.h"
#ifdef __PIC__ #ifdef __PIC__
import pthread_mutex_unlock; import pthread_mutex_unlock;
......
...@@ -7,3 +7,4 @@ ...@@ -7,3 +7,4 @@
#include "Schedule.h" #include "Schedule.h"
#include "Capability.h" #include "Capability.h"
#include "WSDeque.h" #include "WSDeque.h"
#include "SMPClosureOps.h"
...@@ -18,6 +18,7 @@ void sendMessage (Capability *from_cap, Capability *to_cap, Message *msg); ...@@ -18,6 +18,7 @@ void sendMessage (Capability *from_cap, Capability *to_cap, Message *msg);
#include "Capability.h" #include "Capability.h"
#include "Updates.h" // for DEBUG_FILL_SLOP #include "Updates.h" // for DEBUG_FILL_SLOP
#include "SMPClosureOps.h"
INLINE_HEADER void INLINE_HEADER void
doneWithMsgThrowTo (MessageThrowTo *m) doneWithMsgThrowTo (MessageThrowTo *m)
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include "Cmm.h" #include "Cmm.h"
#include "MachDeps.h" #include "MachDeps.h"
#include "SMPClosureOps.h"
#ifdef __PIC__ #ifdef __PIC__
import pthread_mutex_lock; import pthread_mutex_lock;
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
#include "PosixSource.h" #include "PosixSource.h"
#include "Rts.h" #include "Rts.h"
#include "Capability.h"
#include "RtsFlags.h" #include "RtsFlags.h"
#include "RtsUtils.h" #include "RtsUtils.h"
#include "Profiling.h" #include "Profiling.h"
......
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
#include "RtsFlags.h" #include "RtsFlags.h"
#include "sm/OSMem.h" #include "sm/OSMem.h"
#include "hooks/Hooks.h" #include "hooks/Hooks.h"
#include "Capability.h"
#ifdef HAVE_CTYPE_H #ifdef HAVE_CTYPE_H
#include <ctype.h> #include <ctype.h>
...@@ -122,6 +123,7 @@ static void errorRtsOptsDisabled (const char *s); ...@@ -122,6 +123,7 @@ static void errorRtsOptsDisabled (const char *s);
void initRtsFlagsDefaults(void) void initRtsFlagsDefaults(void)
{ {
uint32_t i;
StgWord64 maxStkSize = 8 * getPhysicalMemorySize() / 10; StgWord64 maxStkSize = 8 * getPhysicalMemorySize() / 10;
// if getPhysicalMemorySize fails just move along with an 8MB limit // if getPhysicalMemorySize fails just move along with an 8MB limit
if (maxStkSize == 0) if (maxStkSize == 0)
...@@ -157,8 +159,12 @@ void initRtsFlagsDefaults(void) ...@@ -157,8 +159,12 @@ void initRtsFlagsDefaults(void)
#endif #endif
RtsFlags.GcFlags.heapBase = 0; /* means don't care */ RtsFlags.GcFlags.heapBase = 0; /* means don't care */
RtsFlags.GcFlags.allocLimitGrace = (100*1024) / BLOCK_SIZE; RtsFlags.GcFlags.allocLimitGrace = (100*1024) / BLOCK_SIZE;
RtsFlags.GcFlags.numa = rtsFalse;
RtsFlags.GcFlags.nNumaNodes = 1;
for (i = 0; i < MAX_NUMA_NODES; i++) {
RtsFlags.GcFlags.numaMap[i] = 0;
}
#ifdef DEBUG
RtsFlags.DebugFlags.scheduler = rtsFalse; RtsFlags.DebugFlags.scheduler = rtsFalse;
RtsFlags.DebugFlags.interpreter = rtsFalse; RtsFlags.DebugFlags.interpreter = rtsFalse;
RtsFlags.DebugFlags.weak = rtsFalse; RtsFlags.DebugFlags.weak = rtsFalse;
...@@ -174,7 +180,7 @@ void initRtsFlagsDefaults(void) ...@@ -174,7 +180,7 @@ void initRtsFlagsDefaults(void)
RtsFlags.DebugFlags.squeeze = rtsFalse; RtsFlags.DebugFlags.squeeze = rtsFalse;
RtsFlags.DebugFlags.hpc = rtsFalse; RtsFlags.DebugFlags.hpc = rtsFalse;
RtsFlags.DebugFlags.sparks = rtsFalse; RtsFlags.DebugFlags.sparks = rtsFalse;
#endif RtsFlags.DebugFlags.numa = rtsFalse;
#if defined(PROFILING) #if defined(PROFILING)
RtsFlags.CcFlags.doCostCentres = 0; RtsFlags.CcFlags.doCostCentres = 0;
...@@ -220,7 +226,7 @@ void initRtsFlagsDefaults(void) ...@@ -220,7 +226,7 @@ void initRtsFlagsDefaults(void)
RtsFlags.MiscFlags.linkerMemBase = 0; RtsFlags.MiscFlags.linkerMemBase = 0;
#ifdef THREADED_RTS #ifdef THREADED_RTS
RtsFlags.ParFlags.nNodes = 1; RtsFlags.ParFlags.nCapabilities = 1;
RtsFlags.ParFlags.migrate = rtsTrue; RtsFlags.ParFlags.migrate = rtsTrue;
RtsFlags.ParFlags.parGcEnabled = 1; RtsFlags.ParFlags.parGcEnabled = 1;