Commit 9980fb58 authored by Niklas Hambüchen's avatar Niklas Hambüchen Committed by Marge Bot

Add +RTS --disable-delayed-os-memory-return. Fixes #17411.

Sets `MiscFlags.disableDelayedOsMemoryReturn`.

See the added `Note [MADV_FREE and MADV_DONTNEED]` for details.
parent 01006bc7
Pipeline #12074 failed with stages
in 683 minutes and 2 seconds
......@@ -241,6 +241,28 @@ Miscellaneous RTS options
crashes if exception handling are enabled. In order to get more information
in compiled executables, C code or DLLs symbols need to be available.
.. rts-flag:: --disable-delayed-os-memory-return
If given, uses ``MADV_DONTNEED`` instead of ``MADV_FREE`` on platforms where
this results in more accurate resident memory usage of the program as shown
in memory usage reporting tools (e.g. the ``RSS`` column in ``top`` and ``htop``).
Using this is expected to make the program slightly slower.
On Linux, MADV_FREE is newer and faster because it can avoid zeroing
pages if they are re-used by the process later (see ``man 2 madvise``),
but for the trade-off that memory inspection tools like ``top`` will
not immediately reflect the freeing in their display of resident memory
(RSS column): Only under memory pressure will Linux actually remove
the freed pages from the process and update its RSS statistics.
Until then, the pages show up as ``LazyFree`` in ``/proc/PID/smaps``
(see ``man 5 proc``).
The delayed RSS update can confuse programmers debugging memory issues,
production memory monitoring tools, and end users who may complain about
undue memory usage shown in reporting tools, so with this flag it can
be turned off.
.. rts-flag:: -xp
......
......@@ -213,6 +213,12 @@ typedef struct _MISC_FLAGS {
bool generate_dump_file;
bool generate_stack_trace;
bool machineReadable;
bool disableDelayedOsMemoryReturn; /* See Note [MADV_FREE and MADV_DONTNEED].
It's in `MiscFlags` instead of
`GcFlags` because if GHC used madvise()
memory management for non-GC related
tasks in the future, we'd respect it
there as well. */
bool internalCounters; /* See Note [Internal Counter Stats] */
bool linkerAlwaysPic; /* Assume the object code is always PIC */
StgWord linkerMemBase; /* address to ask the OS for memory
......
......@@ -138,6 +138,7 @@ data MiscFlags = MiscFlags
, generateCrashDumpFile :: Bool
, generateStackTrace :: Bool
, machineReadable :: Bool
, disableDelayedOsMemoryReturn :: Bool
, internalCounters :: Bool
, linkerAlwaysPic :: Bool
, linkerMemBase :: Word
......@@ -446,6 +447,8 @@ getMiscFlags = do
(#{peek MISC_FLAGS, generate_stack_trace} ptr :: IO CBool))
<*> (toBool <$>
(#{peek MISC_FLAGS, machineReadable} ptr :: IO CBool))
<*> (toBool <$>
(#{peek MISC_FLAGS, disableDelayedOsMemoryReturn} ptr :: IO CBool))
<*> (toBool <$>
(#{peek MISC_FLAGS, internalCounters} ptr :: IO CBool))
<*> (toBool <$>
......
......@@ -243,6 +243,7 @@ void initRtsFlagsDefaults(void)
RtsFlags.MiscFlags.generate_stack_trace = true;
RtsFlags.MiscFlags.generate_dump_file = false;
RtsFlags.MiscFlags.machineReadable = false;
RtsFlags.MiscFlags.disableDelayedOsMemoryReturn = false;
RtsFlags.MiscFlags.internalCounters = false;
RtsFlags.MiscFlags.linkerAlwaysPic = DEFAULT_LINKER_ALWAYS_PIC;
RtsFlags.MiscFlags.linkerMemBase = 0;
......@@ -914,6 +915,11 @@ error = true;
OPTION_UNSAFE;
RtsFlags.MiscFlags.machineReadable = true;
}
else if (strequal("disable-delayed-os-memory-return",
&rts_argv[arg][2])) {
OPTION_UNSAFE;
RtsFlags.MiscFlags.disableDelayedOsMemoryReturn = true;
}
else if (strequal("internal-counters",
&rts_argv[arg][2])) {
OPTION_SAFE;
......
......@@ -602,6 +602,26 @@ void osCommitMemory(void *at, W_ size)
}
}
/* Note [MADV_FREE and MADV_DONTNEED]
*
* madvise() provides flags with which one can release no longer needed pages
* back to the kernel without having to munmap() (which is expensive).
*
* On Linux, MADV_FREE is newer and faster because it can avoid zeroing
* pages if they are re-used by the process later (see `man 2 madvise`),
* but for the trade-off that memory inspection tools like `top` will
* not immediately reflect the freeing in their display of resident memory
* (RSS column): Only under memory pressure will Linux actually remove
* the freed pages from the process and update its RSS statistics.
* Until then, the pages show up as `LazyFree` in `/proc/PID/smaps`
* (see `man 5 proc`).
* The delayed RSS update can confuse programmers debugging memory issues,
* production memory monitoring tools, and end users who may complain about
* undue memory usage shown in reporting tools, so with
* `disableDelayedOsMemoryReturn` we provide an RTS flag that allows forcing
* usage of MADV_DONTNEED instead of MADV_FREE.
*/
void osDecommitMemory(void *at, W_ size)
{
int r;
......@@ -618,21 +638,25 @@ void osDecommitMemory(void *at, W_ size)
#endif
#if defined(MADV_FREE)
// Try MADV_FREE first, FreeBSD has both and MADV_DONTNEED
// just swaps memory out. Linux >= 4.5 has both DONTNEED and FREE; either
// will work as they both allow the system to free anonymous pages.
// It is important that we try both methods as the kernel which we were
// built on may differ from the kernel we are now running on.
r = madvise(at, size, MADV_FREE);
if(r < 0) {
if (errno == EINVAL) {
// Perhaps the system doesn't support MADV_FREE; fall-through and
// try MADV_DONTNEED.
// See Note [MADV_FREE and MADV_DONTNEED].
// If MADV_FREE is disabled, fall-through to MADV_DONTNEED.
if (!RtsFlags.MiscFlags.disableDelayedOsMemoryReturn) {
// Try MADV_FREE first, FreeBSD has both and MADV_DONTNEED
// just swaps memory out. Linux >= 4.5 has both DONTNEED and FREE; either
// will work as they both allow the system to free anonymous pages.
// It is important that we try both methods as the kernel which we were
// built on may differ from the kernel we are now running on.
r = madvise(at, size, MADV_FREE);
if(r < 0) {
if (errno == EINVAL) {
// Perhaps the system doesn't support MADV_FREE; fall-through and
// try MADV_DONTNEED.
} else {
sysErrorBelch("unable to decommit memory");
}
} else {
sysErrorBelch("unable to decommit memory");
return;
}
} else {
return;
}
#endif
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment