Cross-module specialisation of recursive functions
It is common for library authors to write overloaded functions but in doing so they cause a performance penalty for their consumers as GHC refuses the specialise such functions across modules.
For example,
{-# language FlexibleContexts #-}
module M where
import Control.Monad.IO.Class
import Control.Monad.Reader
hello :: (MonadIO m, MonadReader Int m) => Int -> m ()
hello n = do
m <- ask
case m `mod` n == 0 of
True -> liftIO $ print "helloo"
False -> hello (n-1)
Using hello
in a client module, we would like to optimise away the explicit dictionary passing once we specialise hello
to a specific monad stack.
import M
import Control.Monad.Reader
import M (hello)
main :: IO ()
main = runReaderT (hello 128) 42
However, as hello
is recursive its unfolding was not included in the interface file. As a result, the specialisation can't take place which leaves us with less efficient code.
The solution to this is mark hello
as INLINABLE
. Once we do this the unfolding of hello
is included in the interface file even though hello
will never be inlined as it is self-recursive and hence the loop-breaker. Once included in the interface file, GHC can properly specialise hello
and produce optimal code.
An aside, it is quite strange to mark such a recursive definition as INLINABLE
to get this behaviour
as you know it will never be inlined. It would perhaps be better to have a better named pragma which ensured unfoldings were placed in interface files.
The two attached files contain the core for these two programs. They were compiled with
ghc-8.0.1 -fforce-recomp -ddump-simpl -O2 mtl-stack.hs
This ticket is to track the behaviour of these types of definitions which are very common in the wild.
A proposed solution on #5928 was to add a flag to always mark overloaded functions as inlinable to make sure these specialisations can take place. This is something which I am planning to implement in order to see what the consequences are in terms of performance and interface file sizes.