The smp/chan benchmark is highly non-deterministic and we should probably not run it by default

I've seen it swing by ~30% in instructions executed run-to-run.

To a minor degree it's also debatable if we want to run the smp and parallel benchmark groups by default in general when using shake.

I initially included them to avoid them breaking over time (since many of them had been broken when I implemented the shake runner!). But at this point I think we should value benchmark stability higher than the risk of these benchmarks bitrotting again.