Compact Region allocator creates unexpectedly small blocks

Summary

When adding a large object of size X to a compact region which has an even larger default block size Y (as specified in the initial compactSized call), then the compact region allocator allocates a compact region block just large enough for accomodating for X, even though it could also just allocate a block of the larger size Y, under the assumption that there most likely will be more object to be added to the compact region.

There are some issues with having a lot of small compact region blocks:

If the small blocks are spread out across more megablocks than stricly needed in terms of size, fragmentation can prevent the RTS from returning memory to the OS (similar to the situation in https://www.well-typed.com/blog/2020/08/memory-fragmentation/, as compact region blocks are pinned)
A lot of small blocks in the linked list of blocks can slow down allocation, when the nursery is not the last block (see #17640)
Smaller blocks mean that the nursery will be exhausted more often, resulting in more calls to allocateForCompact from the compaction itself.

Steps to reproduce

The following Haskell program creates a new compact region with a default block size of one megablock and then fills it with a few hundred large objects.

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

import Control.Monad
import GHC.Compact
import GHC.Compact.Serialized
import GHC.IO
import GHC.Prim

main :: IO ()
main = do
  let
    blocksPerMBlock, blockSize, dataBytesInMegablock :: Integral a => a
    blocksPerMBlock = 252
    blockSize = 4096
    dataBytesInMegablock = blocksPerMBlock * blockSize

  -- Create a new compact region with the maximum possible block size
  -- (1 megablock per compact region block)
  region <- compactSized dataBytesInMegablock False ()
  largeObject <- newLargeObject

  -- Add the large object a few times to our compact region:
  replicateM 510 $ void $ compactAdd region largeObject

  -- Now check how many blocks were allocated, and how much data they each contain
  blockSizes <- withSerializedCompact region $ \serialized ->
    pure $ map snd $ serializedCompactBlockList serialized

  -- We'll find that the first block is full with the first half of the large objects,
  -- but then there's one new block for each of the remaining large objects.
  putStrLn $ "Number of blocks: " <> show (length blockSizes)
  print blockSizes

-- UTIL:

data LargeObject = LargeObject ByteArray#

-- | Create an object larger than the large object threshold
newLargeObject :: IO LargeObject
newLargeObject = IO $ \s ->
  case newByteArray# 4000# s of
    (# s', arr #) -> case unsafeFreezeByteArray# arr s of
      (# s'', frozenArr #) -> (# s'', LargeObject frozenArr #)

Running it prints

Number of blocks: 256
[1032192,4088,4088,4088,4040,4040, ... 250 more ...]

indicating that the compact region consists of 256 blocks, and all but the first are a lot smaller than the desired default block size.

Expected behavior

Ideally, the reproducer would just allocate two compact region blocks, each one megablock large. That way, there would be less memory fragmentation, and fewer blocks in the linked list of compact region blocks.

Proposed Fix

I think the fix for this could just be to delete the special case for objects larger than the large object threshold from the allocateForCompact function, i.e. to delete https://gitlab.haskell.org/ghc/ghc/-/blob/a9ce159ba58ca7e8946b46e19b1361588b677a26/rts/sm/CNF.c#L492-501

No further change would be necessary, the function would still be correct due to the fallback allocation (which incidentally does take the default compact region block size into account): https://gitlab.haskell.org/ghc/ghc/-/blob/a9ce159ba58ca7e8946b46e19b1361588b677a26/rts/sm/CNF.c#L529-538

And as far as I can tell, there is no obvious reason why large objects need to be treated specially in the compact region allocator: As long as there is space in the nursery, a large object would just be allocated there anyway.

Environment

GHC version used: 8.8.4

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information