Default to infinite stack size?
It looks like any code of the form
... = do x1 <- m1 x2 <- m2 [do something with x1 and m2]
will stack overflow when used with a strict monad (like IO) and sufficiently large data to deal with (a 1m element list is enough to exceed the default 8MB stack size limit).
Other examples include:
sequence (m:ms) = do x <- m xs <- sequence ms return (x:xs)
liftM2, liftM3, ...
By slightly modifying your program to do the same thing via heap allocation (e.g. using a difference list in sequence), a stack-overflow rejected program turns into a valid program.
This means that valid and idiomatic code like
xs <- replicateM n someIOAction xs <- mapM someIOAction [1..n]
will eventually crash, which has an impact on practically all Haskell software and libraries out there.
For examples, see https://github.com/ndmitchell/shake/blob/e0e0a43/Development/Shake/Database.hs\#L394 https://github.com/haskell-suite/haskell-src-exts/issues/27 as well as the mapM usages in cabal, haddock, any database binding that allows querying a lists of results, etc.
While some argue that ''"you shouldn't use lists of that size"*, "allocating a 150 MB list is wrong" or that they *"carefully avoid any use of sequence/mapM due to the problem"'' pointed out above, I think that collecting a million results of a strict monadic action into a linked list using standard functions like mapM is a common pattern and valid use case. The stack based evaluation that GHC performs is an optimization and should be transparent to the user; it should not re-define some data to live in a fixed-size memory region, and it should work just as well as it does in GHCi.
It seems that a default infinite stack size would mitigate this.
A drawback might be that the stack space overflows we get so far to quickly spot a "wrong" program would not indicate certain errors any more. I currently doubt the usefulness of this debugging method, since it does not work reliably in the presence of optimization, doing the same thing via heap allocation makes the programs "correct", and it rejects valid programs if the input data is large enough.