Provide user more control about how much memory the RTS retains

Currently after a spike in memory usage the RTS is very reluctant to return previously allocated blocks to the OS. This is to prevent expensive thrashing due to deallocating and reallocating blocks.

The code which controls this lives in GC.c↕

/* If the amount of data remains constant, next major GC we'll
       * require (F+1)*live + prealloc. We leave (F+2)*live + prealloc
       * in order to reduce repeated deallocation and reallocation. #14702
       */
      need = need_prealloc + (RtsFlags.GcFlags.oldGenFactor + 2) * need_live;

      /* Also, if user set heap size, do not drop below it.
       */
      need = stg_max(RtsFlags.GcFlags.heapSizeSuggestion, need);

      /* But with a large nursery, the above estimate might exceed
       * maxHeapSize.  A large resident set size might make the OS
       * kill this process, or swap unnecessarily.  Therefore we
       * ensure that our estimate does not exceed maxHeapSize.
       */
      if (RtsFlags.GcFlags.maxHeapSize != 0) {
          need = stg_min(RtsFlags.GcFlags.maxHeapSize, need);
      }

      need = BLOCKS_TO_MBLOCKS(need);

      got = mblocks_allocated;

      if (got > need) {
          returnMemoryToOS(got - need);
      }
  }

By default oldGenFactor is 2 so we end up retaining up to 4 * the current amount of live which will never be returned if the amount of live data remains high. With the copying GC you will always need 2 * the amount of live data but with the compacting GC you can get away with a small fraction overhead of 1.x* amount of live data.

Another factor in the calculation is the -F flag, which indicates how much to scale the size retained for the oldest generation by before doing another collection. (docs)

Combining these factors we are left with the situation that we need to preserve the following amount of memory.

Copying: ((F -1)+ 2)*live + prealloc
Compacting: ((F - 1) + 1.x) * live) + prealloc

What is currently implemented is (F+2)*live + prealloc in order to add some more overhead to prevent thrashing.

Secondly, I think it would be better if the oldGenFactor would decay over time so that processes with a large amount of live data but limited other allocations don't end up retaining a huge amount of memory after a spike. One possible way would be on a major GC triggered by an idle period, then don't be so keen to retain as much memory. Without an idle GC then these programs will rarely perform GC because the size of the allocation area for the oldest generation will be very big compared to how much work the program does.

https://gitlab.haskell.org/ghc/ghc/-/blob/master/rts/sm/GC.c#L984

Ways that we could improve this are:

When using the compacting GC for the oldest generation -c, only retain a smaller percentage of maximum live bytes, for example 1.2 * live data.
When using the copying GC, provide a flag to more precisely control how much memory is retained past the necessary 2 * live bytes
When using the copying GC, provide a flag to gradually return memory to the OS with some decay factor. Possibly related to idle GCs.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information