... | ... | @@ -22,41 +22,89 @@ There is a tension wrt to the stage in the compilation pipeline that should hand |
|
|
For the purpose of unfolding/inlining definitions, look at groups of mutually recursive definitions as a whole, rather than trying to think about individual definitions. Compare the existing documentation for [\`INLINE/NOINLINE\`](http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#inline-noinline-pragma) pragmas.
|
|
|
|
|
|
|
|
|
|
|
|
In the following, let `REC({f g ..})` denote the set of all identifiers belonging to the recursion involving `f`, `g`, .. (`f`, `g`, .. in `REC({f g ..})` or in `{-# INLINE f g .. #-}` are required to belong to the same recursion).
|
|
|
|
|
|
|
|
|
|
|
|
`{-# NOINLINE f #-}`
|
|
|
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> as now: no unfolding of `f`
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
`{-# INLINE f #-}`
|
|
|
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> as now: for non-recursive `f` only, unfold definition of `f` at call sites of `f` (might in future be taken as go-ahead for analysis-based recursion unfolding)
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
`{-# INLINE f g .. PEEL n #-}`
|
|
|
|
|
|
> new: unfold definitions of the named identifiers at their call sites **outside** their recursion group `REC({f g ..})`. In other words, **entries into**`REC({f g ..})` via `f`, `g`, .. are unfolded.
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> new: unfold definitions of the named identifiers at their call sites **outside** their recursion group `REC({f g ..})`. In other words, **entries into** `REC({f g ..})` via `f`, `g`, .. are unfolded.
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> (for the special case of loops this corresponds to loop peeling)
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
`{-# INLINE f g .. UNROLL m #-}`
|
|
|
|
|
|
> new: unfold definitions of the named identifiers at their call sites **inside** their recursion group `REC({f g ..})`. In other words, **cross-references inside**`REC({f g ..})` via `f`, `g`, .. are unfolded.
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> new: unfold definitions of the named identifiers at their call sites **inside** their recursion group `REC({f g ..})`. In other words, **cross-references inside** `REC({f g ..})` via `f`, `g`, .. are unfolded.
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> (for the special case of loops this corresponds to loop unrolling)
|
|
|
>
|
|
|
>
|
|
|
|
|
|
|
|
|
`{-# INLINE f g .. PEEL n UNROLL m #-}`
|
|
|
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> combine the previous two
|
|
|
>
|
|
|
>
|
|
|
|
|
|
>
|
|
|
> The numeric parameters are to be interpreted as if each call to `f`, `g`, .. was annotated with both `PEEL` and `UNROLL` limits for the whole recursion group `REC({f g ..})`, starting with the limits from the pragmas (write `f_n_m` for a call to `f` with `PEEL` limit `n` and `UNROLL` limit `m`), to be decreased for every `PEEL` or `UNROLL` action, as follows (`REC({f g})` = {`f``g``h`}, in these examples):
|
|
|
>
|
|
|
> The numeric parameters are to be interpreted as if each call to `f`, `g`, .. was annotated with both `PEEL` and `UNROLL` limits for the whole recursion group `REC({f g ..})`, starting with the limits from the pragmas (write `f_n_m` for a call to `f` with `PEEL` limit `n` and `UNROLL` limit `m`), to be decreased for every `PEEL` or `UNROLL` action, as follows (`REC({f g})` = {`f` `g` `h`}, in these examples):
|
|
|
>
|
|
|
>
|
|
|
|
|
|
1.
|
|
|
|
|
|
1. ```wiki
|
|
|
```wiki
|
|
|
let {-# INLINE f g PEEL n UNROLL m #-}
|
|
|
f .. = .. f_?_? .. g_?_? .. h_0_0 ..
|
|
|
g .. = .. f_?_? .. g_?_? .. h_0_0 ..
|
... | ... | @@ -72,15 +120,20 @@ In the following, let `REC({f g ..})` denote the set of all identifiers belongin |
|
|
in ..|.. f_(n-1)_0 .. g_(n-1)_0 .. h_0_0 ..|..
|
|
|
```
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> Notes:
|
|
|
>
|
|
|
>
|
|
|
|
|
|
- unfolding produces copies of definition bodies
|
|
|
- the `PEEL` limit at the call site decides the `PEEL` limit for all calls to `REC({f g})` in the inlined copy; this limit decreases with each `PEEL` step
|
|
|
- since peeling unfolds code into call sites from outside the recursion, the `UNROLL` limits of calls to `REC({f g})` are effectively `0` in the inlined copy
|
|
|
- only calls to identifiers named in the `INLINE` pragma can be peeled (`f` and `g` here), calls to other members of the same recursion remain unaffected (`h` here), having effective limits of `0`
|
|
|
|
|
|
1. ```wiki
|
|
|
1.
|
|
|
|
|
|
```wiki
|
|
|
let {-# INLINE f g PEEL n UNROLL m #-}
|
|
|
f .. = .. f_0_m .. g_?_? .. h_0_0 ..
|
|
|
g .. = .. f_?_? .. g_?_? .. h_0_0 ..
|
... | ... | @@ -104,35 +157,77 @@ In the following, let `REC({f g ..})` denote the set of all identifiers belongin |
|
|
- unrolling unfolds copies of the original definitions, not the already unrolled ones, again corresponding to the existing inlining restriction (TODO how to specify this avoidance of unrolling unrolled defs in this form of local rule spec?)
|
|
|
- only calls to identifiers named in the `INLINE` pragma can be unrolled (`f` and `g` here), calls to other members of the same recursion remain unaffected (`h` here), having effective limits of `0`
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> Peeling and unrolling stop when the respective count annotation has reached `0`. Peeling precedes unrolling, to avoid ambiguities in the size of the peeled definitions. Note that calls into mutual recursion groups is the domain of `PEEL`, while `UNROLL` only applies to calls within mutual recursion groups.
|
|
|
>
|
|
|
>
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> `{-# INLINE f PEEL n #-}`, for `n>0`, corresponds to worker/ wrapper transforms (previously done manually) + inline wrapper, and should therefore also be taken as a hint for the compiler to try the static argument transformation for `f` (the "worker").
|
|
|
>
|
|
|
>
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> Non-supporting implementations should treat these as `INLINE` pragmas (same warning/ignore or automatic unfold behaviour). This might be easier to accomplish if `INLINE PEEL/UNROLL` were implemented as separate pragmas, even though they are refinements of `INLINE` conceptually.
|
|
|
>
|
|
|
>
|
|
|
|
|
|
>
|
|
|
>
|
|
|
> About the current side-conditions for `INLINE` pragmas:
|
|
|
>
|
|
|
>
|
|
|
|
|
|
- no functions inlined into `f`:
|
|
|
|
|
|
>
|
|
|
> >
|
|
|
> > >
|
|
|
> > >
|
|
|
> > > still makes sense for `PEEL`, needs to be adapted with an exception for `UNROLL`, in that we want to be able to unroll into the function being unrolled, but we want to use the original body for the unrolling, not an already unrolled one (else unrolling would be exponential rather than linear); this appears to be in line with existing work on `INLINE`
|
|
|
> > >
|
|
|
> > >
|
|
|
> >
|
|
|
>
|
|
|
|
|
|
- no float-in/float-out/cse:
|
|
|
|
|
|
>
|
|
|
> >
|
|
|
> > >
|
|
|
> > >
|
|
|
> > > similar to existing `INLINE`
|
|
|
> > >
|
|
|
> > >
|
|
|
> >
|
|
|
>
|
|
|
|
|
|
- no worker/wrapper transform in strictness analyser:
|
|
|
|
|
|
>
|
|
|
> >
|
|
|
> > >
|
|
|
> > >
|
|
|
> > > similar to existing `INLINE`
|
|
|
> > >
|
|
|
> > >
|
|
|
> >
|
|
|
>
|
|
|
|
|
|
- loop breakers:
|
|
|
|
|
|
>
|
|
|
> >
|
|
|
> > >
|
|
|
> > >
|
|
|
> > > `PEEL/UNROLL` have their own limits, applicable to the whole recursion group, creating intrinsic loop breakers when the counters run out. Every `PEEL` or `UNROLL` action creates calls with smaller counters in the inlined copies, if the calls go into the same recursion.
|
|
|
> > >
|
|
|
> > >
|
|
|
> >
|
|
|
>
|
|
|
|
|
|
## Examples
|
|
|
|
... | ... | @@ -151,17 +246,32 @@ Examples of hand-unrolled/-peeled loops abound in optimized code. `unroll2.hs` d |
|
|
|
|
|
## Further References
|
|
|
|
|
|
|
|
|
|
|
|
\[0\] GHC mailing list threads, with examples and discussion
|
|
|
|
|
|
|
|
|
- [ optimization and rewrite rules questions](http://www.haskell.org/pipermail/glasgow-haskell-users/2009-February/016695.html)
|
|
|
- [ Loop unrolling + fusion ?](http://www.haskell.org/pipermail/glasgow-haskell-users/2009-February/016729.html)[ continued in March](http://www.haskell.org/pipermail/glasgow-haskell-users/2009-March/016732.html)
|
|
|
- [ Loop unrolling + fusion ?](http://www.haskell.org/pipermail/glasgow-haskell-users/2009-February/016729.html)
|
|
|
[ continued in March](http://www.haskell.org/pipermail/glasgow-haskell-users/2009-March/016732.html)
|
|
|
|
|
|
|
|
|
\[1\] [ An Aggressive Approach to Loop Unrolling](http://citeseer.ist.psu.edu/old/620489.html), 1995
|
|
|
|
|
|
|
|
|
|
|
|
\[2\] [ Compiler Transformations for High-Performance Computing](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.4885), ACM Computing Surveys, 1994
|
|
|
|
|
|
|
|
|
|
|
|
\[3\] [ http://en.wikipedia.org/wiki/Loop_transformation](http://en.wikipedia.org/wiki/Loop_transformation)
|
|
|
|
|
|
|
|
|
|
|
|
\[4\] [ loop unrolling vs hardware](http://www.intel.com/software/products/compilers/flin/docs/main_for/mergedprojects/optaps_for/common/optaps_hlo_unrl.htm)
|
|
|
|
|
|
\[1\][ An Aggressive Approach to Loop Unrolling](http://citeseer.ist.psu.edu/old/620489.html), 1995
|
|
|
|
|
|
\[2\][ Compiler Transformations for High-Performance Computing](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.4885), ACM Computing Surveys, 1994
|
|
|
|
|
|
\[3\][ http://en.wikipedia.org/wiki/Loop_transformation](http://en.wikipedia.org/wiki/Loop_transformation)
|
|
|
\[5\] [ Unrolling and simplifying expressions with Template Haskell](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.9813), Ian Lynagh, 2003
|
|
|
|
|
|
\[4\][ loop unrolling vs hardware](http://www.intel.com/software/products/compilers/flin/docs/main_for/mergedprojects/optaps_for/common/optaps_hlo_unrl.htm)
|
|
|
|
|
|
\[5\][ Unrolling and simplifying expressions with Template Haskell](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.9813), Ian Lynagh, 2003 |