Simon Marlow · 540ac864
--- a/commentary/rts/storage/heap-alloced.md
+++ b/commentary/rts/storage/heap-alloced.md
+# HEAP_ALLOCED
+
+
+This page is about the `HEAP_ALLOCED()` macro/function in the runtime system.  
+
+```wiki
+StgBool HEAP_ALLOCED(void *p);
+```
+
+
+It is defined in `rts/sm/MBlock.h`.  The purpose of `HEAP_ALLOCED()` is to return true if the given address is part of the dynamically-allocated heap, and false otherwise.  Its primary use is in the Garbage Collector: when examining a pointer, we need to get to the block descriptor for that object.  Static objects don't have block descriptors, because they live in static code space, so we need to establish whether the pointer is into the dynamic heap first, hence `HEAP_ALLOCED()`.
+
+
+On a 32-bit machine, `HEAP_ALLOCED()` is implemented with a 4096-entry byte-map, one byte per megabyte of the address space (the dynamic heap is allocated in units of aligned megabytes).  
+
+
+On a 64-bit machine, it's a bit more difficult.  The current method (GHC 6.10.1 and earlier) uses a cache, with a 4096-entry map and a 32-bit tag.  If the upper 32 bits of the pointer match the tag, we look up in the map, otherwise we back off to a slow method that searches a list of mappings (bug [\#2934](https://gitlab.haskell.org//ghc/ghc/issues/2934) is about the lack of thread-safety in the slow path here).  This arrangement works fine for small heaps, but is pessimal for large (multi-GB) heaps, or heaps that are scattered around the address space.
+
+## Speeding up `HEAP_ALLOCED()`
+
+
+We should consider how to speed up `HEAP_ALLOCED()` for large heaps on 64-bit machines.  This involves some kind of cache arrangement - the memory map is like a page table, and we want a cache that gives us quick access to commonly accessed parts of that map.
+
+faster-heap-alloced.patch.gz implements one such scheme.  Measurements show that it slows down GC by about 20% for small heaps (hence it wasn't committed), though it would probably speed up GC on large heaps.
+
+## Eliminating `HEAP_ALLOCED` completely
+
+
+Can we eliminate `HEAP_ALLOCED` altogether?  We must arrange that all closure pointers have a valid block descriptor.
+
+### Method 1: put static closures in an aligned section
+
+
+ELF sections can be arbitrarily aligned.  So we could put all our static closures in a special section, align the section to 1MB, and arrange that there is space at the beginning of the section for the block descriptors.
+
+
+This almost works (see eliminate-heap-alloced.patch.gz), but sadly fails for shared libraries: the system dynamic linker doesn't honour section-alignment requests larger than a page, it seems.
+
+### Method 2: copy static closures into a special area at startup
+
+
+We could arrange that we access all static closures via indirections, and then at startup time we copy all the static closures into a special area with block descriptors.
+
+
+Disadvantages:
+  
+
+- references to static objects go through another indirection.  
+
+  - when doing dynamic linking, references to static objects in another package
+    already go through an indirection and we could arrange that only one indirection is required.
+  - References to static closures from the the fields of a static constructor would not incur the extra indirection,
+    only direct references to static closures from code.
+  - we currently reference the static closure of a function from the heap-check-fail code, but in fact
+    we only really need to pass the info pointer.
+
+
+Advantages
+
+- we get to fix up all the tag bits in static closure pointers
+- we get to eliminate HEAP_ALLOCED, speeding up GC and removing complexity
+- CAFs might get a bit simpler, since they are already indirections into the heap