Skip to content

GHCI is slow for precompiled code

Hello! We found that loading compiled and optimised accelerate code into ghci works much much slower than code run directly.

(I'm marking it as a ghc-api bug, because I noticed this behavior while using it)

Here is the example:

module Main where

import Data.Array.Accelerate as A
import Data.Array.Accelerate.CUDA as C
import Data.Time.Clock       (diffUTCTime, getCurrentTime)

main :: IO ()
main = do
    start <- getCurrentTime
    print $ C.run $ A.maximum $ A.map (+1) $ A.use (fromList (Z:.1000000) [1..1000000] :: Vector Double)
    end   <- getCurrentTime
    print $ diffUTCTime end start

When I compile it and run

    $ ghc -O2 Main.hs -threaded
    [1 of 1] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...
    $ ./Main
    Array (Z) [1000001.0]
    0.162204s

It takes 0.16 s to finish When I compile it and load into ghci:

    $ ghc -O2 -dynamic -c Main.hs
    $ ghci 
    GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help
    Loading package ghc-prim ... linking ... done.
    Loading package integer-gmp ... linking ... done.
    Loading package base ... linking ... done.
    Prelude> :l Main
    Ok, modules loaded: Main.
    Prelude Main> main
    Loading package transformers-0.3.0.0 ... linking ... done.
    Loading package array-0.5.0.0 ... linking ... done.
    Loading package deepseq-1.3.0.2 ... linking ... done.
    Loading package bytestring-0.10.4.0 ... linking ... done.
    Loading package containers-0.5.5.1 ... linking ... done.
    Loading package binary-0.7.1.0 ... linking ... done.
    Loading package pretty-1.1.1.1 ... linking ... done.
    Loading package filepath-1.3.0.2 ... linking ... done.
    Loading package old-locale-1.0.0.6 ... linking ... done.
    Loading package time-1.4.2 ... linking ... done.
    Loading package unix-2.7.0.1 ... linking ... done.
    Loading package directory-1.2.1.0 ... linking ... done.
    Loading package process-1.2.0.0 ... linking ... done.
    Loading package stm-2.4.4 ... linking ... done.
    Loading package SafeSemaphore-0.10.1 ... linking ... done.
    Loading package mtl-2.1.3.1 ... linking ... done.
    Loading package template-haskell ... linking ... done.
    Loading package fclabels-2.0.2.2 ... linking ... done.
    Loading package text-1.1.1.3 ... linking ... done.
    Loading package hashable-1.2.3.1 ... linking ... done.
    Loading package primitive-0.5.4.0 ... linking ... done.
    Loading package vector-0.10.12.2 ... linking ... done.
    Loading package hashtables-1.2.0.1 ... linking ... done.
    Loading package unordered-containers-0.2.5.1 ... linking ... done.
    Loading package accelerate-0.15.0.0 ... linking ... done.
    Loading package byteable-0.1.1 ... linking ... done.
    Loading package cryptohash-0.11.6 ... linking ... done.
    Loading package cuda-0.6.5.1 ... linking ... done.
    Loading package exception-transformers-0.3.0.4 ... linking ... done.
    Loading package exception-mtl-0.3.0.5 ... linking ... done.
    Loading package old-time-1.1.0.2 ... linking ... done.
    Loading package polyparse-1.11 ... linking ... done.
    Loading package cpphs-1.18.6 ... linking ... done.
    Loading package haskell-src-exts-1.16.0.1 ... linking ... done.
    Loading package syb-0.4.4 ... linking ... done.
    Loading package th-lift-0.7 ... linking ... done.
    Loading package safe-0.3.8 ... linking ... done.
    Loading package th-expand-syns-0.3.0.4 ... linking ... done.
    Loading package th-reify-many-0.1.2 ... linking ... done.
    Loading package th-orphans-0.8.3 ... linking ... done.
    Loading package haskell-src-meta-0.6.0.8 ... linking ... done.
    Loading package srcloc-0.4.1 ... linking ... done.
    Loading package mainland-pretty-0.2.7 ... linking ... done.
    Loading package symbol-0.2.4 ... linking ... done.
    Loading package language-c-quote-0.8.0 ... linking ... done.
    Loading package accelerate-cuda-0.15.0.0 ... linking ... done.
    Array (Z) [1000001.0]
    0.256128s

It takes 0.26 s to finish. On other computer, using criterion, we observed even a 50x difference.

Why is it working so slow, isn't ghci just to load function and simply call it?

Here is how execution time changes for different matrix sizes. For comparison we have also measured time of interpreted code (without precompiling).

    size     | compiled  | precompiled  | interpreted        
    ---------+-----------+--------------+------------
    100      | 0.054076s | 0.082686s    | 0.151857s   
    1000     | 0.054509s | 0.08305s     | 0.135452s  
    10000    | 0.055405s | 0.08469s     | 0.12632s   
    100000   | 0.057768s | 0.093011s    | 0.155359s   
    1000000  | 0.089811s | 0.251359s    | 0.202022s   
    10000000 | 0.397642s | 1.400603s    | 0.898547s   

We believe that problem lies on the side of ghci rather than accelerate, how could we confirm this?

Moreover and even more important is there any workaround for it, if we need this precompiled code running fast in production environment?

Any guidance will be appreciated.

Edited by Simon Peyton Jones
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information