Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,268
    • Issues 4,268
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 408
    • Merge Requests 408
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #2387

Closed
Open
Opened Jun 19, 2008 by dolio@trac-dolio

Optimizer misses unboxing opportunity

In my studying of the fannkuch benchmark, I've discovered (I think) another missed optimization. A scaled down illustration goes as follows:

{-# LANGUAGE TypeOperators, BangPatterns #-}

module Main (main) where

import Control.Monad.ST

import System.Environment

data (:*:) a b = !a :*: !b

whileLoop :: Int -> ST s Int
whileLoop = go 0
 where
 go !n k
   | k == 0    = return n
   | otherwise = go (n+1) (k-1)
{-# INLINE whileLoop #-}

iter :: Int -> Int -> ST s (Bool :*: Int)
iter k n = do
  k' <- whileLoop 40 >>= \k' -> return $! max k k'
  b <- return (n == 0)
  
  return $! b :*: k'
{-# INLINE iter #-}

mainLoop :: Int -> Int -> ST s Int
mainLoop k n = do
  done :*: k' <- iter k n
  
  if done
    then return k'
    else mainLoop k' (n - 1)

main = print =<< stToIO . mainLoop 0 . read . head =<< getArgs

If we look at the core for whileLoop's worker, we see:

$wpoly_go_r1aE :: forall s_aem.
                  Int# -> Int# -> STRep s_aem Int

$wpoly_go_r1aE =
  \ (@ s_aem)
    (ww_s18Y :: Int#)
    (ww1_s192 :: Int#)
    (eta_s19w :: State# s_aem) ->
    case ww1_s192 of wild_XF {
      __DEFAULT ->
        $wpoly_go_r1aE @ s_aem (+# ww_s18Y 1) (-# wild_XF 1) eta_s19w;
      0 -> (# eta_s19w, I# ww_s18Y #)
    }

Note, the return type is a boxed Int. The function is only used once, like so:

    ...
    case $wpoly_go_r1aE @ s_aem 0 40 w_s19f of wild_aFw { (# new_s_aFB, r_aFC #) ->
    case r_aFC of wild1_aG9 { I# y_aGb ->
    case <=# ww_s199 y_aGb of wild2_aGd {
      False ->
    ...

In other words, go boxes its results at the end of the loop, and the function that uses it immediately looks inside the box for the value. In this particular micro-benchmark, the boxed value (wild1_aG9 above) is actually used in the case where mainLoop returns the boxed value. However, in the larger benchmark I pulled this from, that is not the case in several areas (only the unboxed value is used, but it still goes through a box). Either way, the boxed value is only used at the end of the loop (and not every time there), and on every other iteration, this results in superfluous allocation.

I'll attach a manually unboxed version I wrote (it also has iter and max manually inlined to mainLoop, since that makes it easier to write; the core for the above shows that they are getting inlined properly, so I assume that isn't the issue). Using +RTS -sstderr shows that (running 100 million iterations here), the manually unboxed version allocates 50 kilobytes on the heap, and runs in around 15 seconds, whereas the version above that doesn't get unboxed does 1.6 gigabytes of heap allocation, and takes 18 seconds (in the larger benchmark, such extra boxing would happen perhaps 40 million times over the course of the program).

Thanks for your help.

Trac metadata
Trac field Value
Version 6.8.2
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Edited Apr 01, 2019 by Ben Gamari
Assignee
Assign to
8.0.1
Milestone
8.0.1 (Past due)
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#2387