## Poor performance of higher-order functions with unboxing

I was testing out some code to see how GHC handled some unboxed higher-order functions, and was suprised by the results I found. Here is some sample code:

```
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import System.Environment
rel# :: Int# -> Int# -> Int# -> Bool
rel# i# j# k# = (i# +# j# +# k#) ># 100000000#
rel :: Int -> Int -> Int -> Bool
rel (I# i#) (I# j#) (I# k#) = rel# i# j# k#
manual :: (Int# -> Int# -> Int# -> Bool) -> (Int, Int, Int)
manual r# = go 0# 0# 0#
where
go i# j# k# | r# i# j# k# = (I# i#, I# j#, I# k#)
| otherwise = go j# k# (i# +# 1#)
{-# NOINLINE manual #-}
auto :: (Int -> Int -> Int -> Bool) -> (Int, Int, Int)
auto r = go 0 0 0
where
go !i !j !k | r i j k = (i, j, k)
| otherwise = go j k (i+1)
{-# NOINLINE auto #-}
main = getArgs >>= \case
"manual" : _ -> print $ manual rel# -- This case is significantly slower.
"auto" : _ -> print $ auto rel -- Why?
```

A loop that has to box its loop parameters to call a predicate turns out to be significantly faster than one that uses a predicate that takes unboxed values directly.

The answer turns out to be (I believe) in ghc/utils/genapply/GenApply.hs. applyTypes has an entry [P,P,P], but only [N]. This means that the manual loop has to use a slower calling convention than the boxed loop.

I'm not sure whether this should be 'fixed,' as my encounter with it was experimental in nature, and I don't have a real use case. The comment on applyTypes says its cases cover 99% of uses, and mine is not a real use. This ticket may serve as documentation at least, though.

## Trac metadata

Trac field | Value |
---|---|

Version | 7.6.3 |

Type | Task |

TypeOfFailure | OtherFailure |

Priority | low |

Resolution | Unresolved |

Component | Compiler |

Test case | |

Differential revisions | |

BlockedBy | |

Related | |

Blocking | |

CC | |

Operating system | |

Architecture |