Cross-module specialisation of recursive functions
It is common for library authors to write overloaded functions but in doing so they cause a performance penalty for their consumers as GHC refuses the specialise such functions across modules.
For example,
{-# language FlexibleContexts #-}
module M where
import Control.Monad.IO.Class
import Control.Monad.Reader
hello :: (MonadIO m, MonadReader Int m) => Int -> m ()
hello n = do
m <- ask
case m `mod` n == 0 of
True -> liftIO $ print "helloo"
False -> hello (n-1)
Using hello in a client module, we would like to optimise away the explicit dictionary passing once we specialise hello to a specific monad stack.
import M
import Control.Monad.Reader
import M (hello)
main :: IO ()
main = runReaderT (hello 128) 42
However, as hello is recursive its unfolding was not included in the interface file. As a result, the specialisation can't take place which leaves us with less efficient code.
The solution to this is mark hello as INLINABLE. Once we do this the unfolding of hello is included in the interface file even though hello will never be inlined as it is self-recursive and hence the loop-breaker. Once included in the interface file, GHC can properly specialise hello and produce optimal code.
An aside, it is quite strange to mark such a recursive definition as INLINABLE to get this behaviour
as you know it will never be inlined. It would perhaps be better to have a better named pragma which ensured unfoldings were placed in interface files.
The two attached files contain the core for these two programs. They were compiled with
ghc-8.0.1 -fforce-recomp -ddump-simpl -O2 mtl-stack.hs
This ticket is to track the behaviour of these types of definitions which are very common in the wild.
A proposed solution on #5928 was to add a flag to always mark overloaded functions as inlinable to make sure these specialisations can take place. This is something which I am planning to implement in order to see what the consequences are in terms of performance and interface file sizes.