Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,868
    • Issues 4,868
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 456
    • Merge requests 456
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #10371
Closed
Open
Created May 01, 2015 by MikeIzbicki@trac-MikeIzbicki

GHC fails to inline and specialize a function

I have an alternative Prelude library called [subhask](https://github.com/mikeizbicki/subhask) that redefines the numeric type class hierarchy. I'm trying to update it to work with GHC 7.10, but there is a major inlining bug that is killing performance.

The code below demonstrates the issue. It first defines a distance function over 2 vectors, then measures the performance using criterion. (It requires the subhask to compile.)

{-# LANGUAGE BangPatterns #-}

import Control.DeepSeq
import Criterion.Main
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Generic as VG

import qualified Prelude
import SubHask

-- distance_standalone :: VU.Vector Float -> VU.Vector Float -> Float
distance_standalone v1 v2 = sqrt $ go 0 0
    where
        go !tot !i =  if i>VG.length v1-4
            then goEach tot i
            else go tot' (i+4)
            where
                tot' = tot
                    +(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                    *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                    +(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1))
                    *(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1))
                    +(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2))
                    *(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2))
                    +(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3))
                    *(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3))

        goEach !tot !i = if i>= VG.length v1
            then tot
            else goEach tot' (i+1)
            where
                tot' = tot+(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                          *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)


main = do
    let v1 = VU.fromList [1..200] :: VU.Vector Float
        v2 = VU.fromList [1..200] :: VU.Vector Float

    deepseq v1 $ deepseq v2 $ return ()

    defaultMain
        [ bench "distance_standalone" $ nf (distance_standalone v1) v2
        ]

Here are the results of compiling and running using GHC 7.10 and 7.8:

$ ghc-7.10.1 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl && ./Main 
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
benchmarking distance_standalone
time                 8.135 μs   (8.121 μs .. 8.154 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 8.188 μs   (8.158 μs .. 8.250 μs)
std dev              139.3 ns   (66.05 ns .. 250.4 ns)
variance introduced by outliers: 15% (moderately inflated)
$ ghc-7.8.2 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl && ./Main 
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
benchmarking distance_standalone
time                 733.2 ns   (732.9 ns .. 733.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 734.1 ns   (733.7 ns .. 734.5 ns)
std dev              1.458 ns   (1.262 ns .. 1.754 ns)

As you can see, GHC 7.10 is 10x slower. Looking through the core output shows that the cause of this is that GHC 7.8 is properly specializing the code whereas GHC 7.10 is not. If you uncomment the type signature before the distance_standalone function then both compilers perform at the faster speed.

I believe the cause of this may be related to the complicated class numeric class hierarchy in SubHask. If you comment out the lines:

import qualified Prelude
import SubHask

then GHC uses the Prelude hierarchy instead of SubHask's hierarchy, and both compilers generate the faster program.

There's one last wrinkle. If you define the distance_standalone function in a different file. Then in GHC 7.10, the INLINE and INLINABLE pragmas do absolutely nothing. Not only does the resulting code not get inlined, but if I add the specialization:

{-# SPECIALIZE distance_standalone :: VU.Vector Float -> VU.Vector Float -> Float #-}

to the Main file, I get an error message saying something like:

bench/Vector.hs:18:1: Warning:
    You cannot SPECIALISE ‘distance_standalone{v ru1}’
      because its definition has no INLINE/INLINABLE pragma
      (or its defining module ‘subhask-0.1.0.0@subha_LNZiQvSbo8Z0VdLTwuvkrN:SubHask.Algebra’
       was compiled without -O)

I get this message despite the fact that the defining module was compiled with -O2 and the function had an INLINABLE pragma. If I add the specialization pragma to the defining module, then I don't get the warning, but the code still doesn't specialize properly.

Trac metadata
Trac field Value
Version 7.10.1
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking