shake: cachegrind does not seem to be run in parallel
The compilation does seem to happen in parallel, but running of the benchmarks doesn't seem to happen in parallel. At least not on top of !44 (merged)
While this isn't the end of the world it's the main benefit from using shake, so I really should fix this.