Skip to content

Suboptimal code

This is with ghc version 8.6.5.

I am new to Haskell and ghc so this may or may not be a real issue and I am perhaps misunderstanding something. While trying to understand the CORE intermediate representation I noticed perhaps some sub-optimal code being generated. Consider the following simple program:

{-# LANGUAGE BangPatterns #-}

module B (
  lowers
  , freqs) where

{-# NOINLINE percent #-}
percent :: Int -> Int -> Float
percent n m = (fromIntegral n / fromIntegral m) * 100.0

{-# NOINLINE count #-}
count :: Char -> String -> Int
count c s = length [x | x <- s, x == c]

{-# NOINLINE lowers #-}
lowers :: String -> Int
lowers s = length [x | x <- s, 'a' <= x && x <= 'z']

freqs :: String -> [Float]
freqs s = [ percent (count x s) n | x <- ['a'..'z']]
  where
    n :: Int
    n = lowers s

For which we get the following core (at -O2) for function 'freqs':

-- RHS size: {terms: 41, types: 14, coercions: 0, joins: 0/2}
freqs :: String -> [Float]
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=<L,U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 252 0}]
freqs
  = \ (s_a4Ha :: String) ->
      let {
        n_s54b [Dmd=<L,U(U)>] :: Int
        [LclId]
        n_s54b
          = case B.$wlowers s_a4Ha of ww_s56Z { __DEFAULT ->
            ghc-prim-0.5.3:GHC.Types.I# ww_s56Z                                   <---(1)---- Boxing
            } } in
      letrec {
        go_a53W [Occ=LoopBreaker]
          :: ghc-prim-0.5.3:GHC.Prim.Int# -> [Float]
        [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
        go_a53W
          = \ (x_a53X :: ghc-prim-0.5.3:GHC.Prim.Int#) ->
              case ghc-prim-0.5.3:GHC.Prim.># x_a53X 122# of {
                __DEFAULT ->
                  ghc-prim-0.5.3:GHC.Types.:
                    @ Float
                    (case B.$wcount
                            (ghc-prim-0.5.3:GHC.Types.C# (ghc-prim-0.5.3:GHC.Prim.chr# x_a53X))
                            s_a4Ha
                     of ww_s57c
                     { __DEFAULT ->
                     case n_s54b of { ghc-prim-0.5.3:GHC.Types.I# ww2_s56J ->     <--(2)--- Unboxing in the loop
                     case B.$wpercent ww_s57c ww2_s56J of ww3_s56N {
		     __DEFAULT ->
                     ghc-prim-0.5.3:GHC.Types.F# ww3_s56N
                     }
                     }
                     })
                    (go_a53W (ghc-prim-0.5.3:GHC.Prim.+# x_a53X 1#));
                1# -> ghc-prim-0.5.3:GHC.Types.[] @ Float
              }; } in
      go_a53W 97#

At issue is the creation of variable 'n_s54b'. To create it ghc calls the worker version of 'lowers' which returns an Int# and (at (1)) we box it. Then later on inside the loop we unbox (at (2)) to get the primitive int into 'ww2_s56J' which is what we actually need for the call to '$wpercent' (which takes Int# and not Int). So why did we ever build the boxed version of this value? Since ghc generates these worker functions all over the place this must be happening a lot and it is perhaps an optimization worth doing.

Surprisingly if the last line of 'freqs' is changed so that 'n' is strict (via a Bang Pattern):

  !n = lowers s

then we generate exactly the code one would expect (i.e. we remember the Int# and not the Int) -- I am sure this has an interesting explanation too.

freqs
  = \ (s_a4Ha :: String) ->
      case B.$wlowers s_a4Ha of ww_s57c { __DEFAULT ->             <------ No Boxing!
      letrec {
        go_a53Z [Occ=LoopBreaker]
          :: ghc-prim-0.5.3:GHC.Prim.Int# -> [Float]
        [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
        go_a53Z
          = \ (x_a540 :: ghc-prim-0.5.3:GHC.Prim.Int#) ->
              case ghc-prim-0.5.3:GHC.Prim.># x_a540 122# of {
                __DEFAULT ->
                  ghc-prim-0.5.3:GHC.Types.:
                    @ Float
                    (case B.$wcount
                            (ghc-prim-0.5.3:GHC.Types.C# (ghc-prim-0.5.3:GHC.Prim.chr# x_a540))
                            s_a4Ha
                     of ww1_s57p
                     { __DEFAULT ->
                     case B.$wpercent ww1_s57p ww_s57c of ww2_s570 { __DEFAULT ->
                     ghc-prim-0.5.3:GHC.Types.F# ww2_s570
                     }
                     })
                    (go_a53Z (ghc-prim-0.5.3:GHC.Prim.+# x_a540 1#));
                1# -> ghc-prim-0.5.3:GHC.Types.[] @ Float
              }; } in
      go_a53Z 97#
      }
Edited by Neo
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information