## compactAdd run time grows linearly with compact region size for certain inputs

## Summary

For objects of certain sizes `compactAdd`

is linear in the number of already added objects.
When adding multiple such objects, the total time for adding them all grows quadratically.

## Steps to reproduce

The following reproducer adds the value of type `X`

into a compact region `n`

times, and then again `2n`

times into a different compact region. The total time quadruples, even though the number of `compactAdd`

s was only doubled.

```
{-# OPTIONS_GHC -O2 #-}
import Control.Monad (replicateM_)
import GHC.Compact (compact, compactAdd, compactSize)
import System.CPUTime (getCPUTime)
import Data.Word (Word64)
import Text.Printf (printf)
import System.Environment (getArgs)
data X = X !Word64 !Word64 !Word64 !Word64 !Word64 !Word64 !Word64 !Word64
main :: IO ()
main = do
-- Use this in the object we put in the compact region,
-- in order to prevent GHC from making it a CAF
-- (which are not added to compact regions).
blackboxInput <- length <$> getArgs
let
-- Build a value that is larger than the allowed remaining free space
-- before a block is considered full.
obj = X (fromIntegral blackboxInput) 0 0 0 0 0 0 0
numAdd = 200000
c1 <- compact ()
c2 <- compact ()
-- Copy the above object multiple times in each compact region
-- and measure how long it takes.
printf "Calling compactAdd %d times\n" numAdd
start1 <- getCPUTime
replicateM_ numAdd (compactAdd c1 obj)
end1 <- getCPUTime
printf "Calling compactAdd %d times\n" (2 * numAdd)
start2 <- getCPUTime
replicateM_ (2 * numAdd) (compactAdd c2 obj)
end2 <- getCPUTime
-- Print the sizes of the compact regions as a sanity check
-- The 2nd should be twice as large as the first
putStrLn "Compact region sizes (sanity check)"
compactSize c1 >>= print
compactSize c2 >>= print
let
ps1 = end1 - start1
ps2 = end2 - start2
printf "1st: %d ps total, %f ps/obj\n" ps1 (realToFrac ps1 / realToFrac numAdd :: Double)
printf "2nd: %d ps total, %f ps/obj\n" ps2 (realToFrac ps2 / realToFrac (2 * numAdd) :: Double)
-- Note that the time per compactAdd for the second run is roughly
-- twice as large as for the first run.
-- The total time has quadrupled, even though the number of
-- compactAdd calls only doubled.
```

To me it looks like the cause is the linear search for a free block in `allocateForCompact`

that is triggered when the object does not fit into the nursery block, but the nursery block isn't full either:

https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.5-release/rts/sm/CNF.c#L516-524

Note that `-O2`

is required so that the value of type `X`

is actually a single heap
object. Otherwise, GHC doesn't seem to unpack the fields, and the small
boxed `Word64`

fill up the nursery block eventually which prevents this
bug from triggering.

## Expected behavior

The time it takes for adding the objects to the compact region should grow linearly with the number of objects, not quadratically.

## Environment

- GHC version used: 8.8.1, 8.6.5

Optional:

- Operating System: Ubuntu 18.04.3 LTS
- System Architecture: x86_64