Breaking down costs to different tasks of the program
We have a program that performs multiple tasks, some of which are background tasks where performance, and latency in particular, doesn't matter very much. Other parts are very latency sensitive. For our purposes it matters a lot whether an expensive thunk is evaluated in a latency-sensitive part of the program. However, lexical cost centres hide this information: When a thunk is entered, the saved CCS from the thunk is used as the current CCS.
I've been working on extending cost centre stacks to support multiple counters for each stack, and a simple API to set the current counter. In the design I've tried to avoid additional indirections to avoid slowing down the program more than necessary. I think we can potentially use the multiple counters for other purposes, like attributing costs to a different counter when a thread is blocked on MVar, FFI, GC.
In the next message I'll outline my current implementation, which changes the size of each CostCentreStack to incorporate multiple counters, set at program startup.
Possible alternatives:
- Store the counters outside the CCS struct (CCS can remain statically allocated, but more indirections each counter bump?)
- Perhaps there are entirely different ways to break down performance?