Add e.g. "-N<=4" in addition to the fixed "-N4" and variable "-N" RTS options
As discussed in this issue:
stack is one example of a program that optimistically turned on "-N". Indeed, it feels like a reasonable and even safe option for a multithreaded program. Unfortunately, -N currently guarantees bad performance on large machines and especially on large machines with hyperthreading. No one should ship an executable with -N by default as of GHC 7.8 and 7.10 IMHO.
Unfortunately, even if stack did get a speedup at, say, 4 or 8 cores, it would not be good to ship it with "-N8" either. This would be an unreasonable choice on small, one or two core machines.
What we need is a way to say that the program can productively use parallelism up to a certain upper bound, but that fewer threads should be used if there are not enough cores available. I propose "-N<=8" as a potential syntax.
Currently, this behavior can be achieved with
getNumProcessors, but I think it's worth a command line RTS option.