Return large byte arrays to the large_objects list

Motivation

Often, it is useful to have a big slab of memory to scribble intermediate results on. In a number of cases, the user knows that this this slab will simply be discarded at the end of a computation. Sadly, there's nothing the user can do with this knowledge to help the runtime out. It would be great if there were a way to make return these blocks so that they didn't contribute to more frequent GC.

Proposal

Introduce a primitive for removing large byte arrays from the large objects list called forgetLargeMutableByteArray# (I'll present a lifted variant too for convenience):

-- Invariant: the caller must not use the argument after calling this function on it
forgetLargeMutableByteArray# :: MutableByteArray# s -> State# s -> State# s
forgetLargeMutableByteArray :: MutableByteArray s -> ST s ()

Such a primitive could be used in contexts where it was paired with newByteArray# like this:

example :: Int -> Int
example i = runST $ do
  arr <- newByteArray (8192 - (2 * sizeOf @Int undefined))
  ...
  r <- ...
  forgetLargeMutableByteArray arr
  return r

There's no need for any kind of bracketing. If an exception gets thrown before the byte array is removed from the large objects list, GC will handle it later. Only byte arrays corresponding to objects whose size is a multiple of the block size would be eligible for removal. Calling this primop on an ineligible byte array would just be a no-op (it wouldn't crash the program). The implementation would basically do the opposite of what allocateMightFail does in the large object case. Here's the relevant part of allocateMightFail:

accountAllocation(cap, n);
ACQUIRE_SM_LOCK
bd = allocGroupOnNode(cap->node,req_blocks);
dbl_link_onto(bd, &g0->large_objects);
g0->n_large_blocks += bd->blocks; // might be larger than req_blocks
g0->n_new_large_words += n;
RELEASE_SM_LOCK;
initBdescr(bd, g0, g0);
bd->flags = BF_LARGE;
bd->free = bd->start + n;
cap->total_allocated += n;
return bd->start;

We still have to acquire the lock, just like allocateMightFail does. The allocation counting stuff is kind of tricky. I suppose that decrementing cap->total_allocated, g0->n_large_blocks, and g0->n_new_large_words would be the right thing to do. The hardest part though is removing the block from the doubly-linked list. From an StgArrBytes, I don't think there is any way to recover the corresponding bdescr* needed to unlink the object from the list. One option would be to just scan the list, but that seems terrible.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information