... | ... | @@ -17,21 +17,34 @@ The other implementation currently available is `integer-simple`, which uses a s |
|
|
## The Integer interface
|
|
|
|
|
|
|
|
|
All Integer implementations should export the same set of types and functions from `GHC.Integer`. These exports are used by the `base` package However, all of these types and functions must actually be defined in `GHC.Integer.Type`, so that GHC knows where to find them.
|
|
|
All Integer implementations should export the same set of types and functions from `GHC.Integer` (within whatever `integer` package you are using). These exports are used by the `base` package However, all of these types and functions must actually be defined in `GHC.Integer.Type`, so that GHC knows where to find them.
|
|
|
|
|
|
## How Integer is handled inside GHC
|
|
|
|
|
|
- **Front end**. Integers are represented using the `HsInteger` constructor of `HsLit` for the early phases of compilation (e.g. type checking)
|
|
|
|
|
|
Integers are represented using the `HsInteger` constructor of `HsLit` for the early phases of compilation (e.g. type checking), but for later stages, once we use the `Core` representation, they are converted to the `LitInteger` constructor of the `Literal` type by `mkIntegerExpr`. While `Integer`s aren't "machine literals" like the other `Literal` constructors, it is more convenient when writing rules to pretend that they are literals rather than having to understand their real core representation. We also carry around a `Type`, representing the `Integer` type, in the constructor, as we need access to it in a few functions (e.g. `literalType`).
|
|
|
- **Core**. In `Core` representation, an integer literal is represented by the LitInteger` constructor of the `Literal\` type.
|
|
|
|
|
|
```wiki
|
|
|
data Literal = ... | LitInteger Integer Type
|
|
|
```
|
|
|
|
|
|
All of the types and functions in the `Integer` interface have built-in names, e.g. `plusIntegerName`, defined in [compiler/prelude/PrelNames.lhs](/trac/ghc/browser/ghc/compiler/prelude/PrelNames.lhs) and included in `basicKnownKeyNames`. This allows us to match on all of the functions in `builtinIntegerRules` in [compiler/prelude/PrelRules.lhs](/trac/ghc/browser/ghc/compiler/prelude/PrelRules.lhs), so we can constant-fold Integer expressions.
|
|
|
While `Integer`s aren't "machine literals" like the other `Core``Literal` constructors, it is more convenient when writing constant folding RULES to pretend that they are literals rather than having to understand their concrete representation. (Especially as the concrete representation varies from package to package.) We also carry around a `Type`, representing the `Integer` type, in the constructor, as we need access to it in a few functions (e.g. `literalType`).
|
|
|
|
|
|
- **Constant folding**. There are many constant-folding optimisations for `Integer` expressed as built-in rules in [compiler/prelude/PrelRules.lhs](/trac/ghc/browser/ghc/compiler/prelude/PrelRules.lhs); look at `builtinIntegerRules`. All of the types and functions in the `Integer` interface have built-in names, e.g. `plusIntegerName`, defined in [compiler/prelude/PrelNames.lhs](/trac/ghc/browser/ghc/compiler/prelude/PrelNames.lhs) and included in `basicKnownKeyNames`. This allows us to match on all of the functions in `builtinIntegerRules` in [compiler/prelude/PrelRules.lhs](/trac/ghc/browser/ghc/compiler/prelude/PrelRules.lhs), so we can constant-fold Integer expressions.
|
|
|
|
|
|
We keep the `LitInteger` representation as late as possible; in particular, it's important that this representation is used in unfoldings in interface files, so that constant folding can happen on expressions that get inlined. We only convert it to a proper core representation of Integer in [compiler/coreSyn/CorePrep.lhs](/trac/ghc/browser/ghc/compiler/coreSyn/CorePrep.lhs), which looks up the Id for `mkInteger` and uses it to build an expression like `mkInteger True [123, 456]` (where the `Bool` represents the sign, and the list of `Int`s are 31 bit chunks of the absolute value from lowest to highest).
|
|
|
- **Representing integers**. We stick to the `LitInteger` representation (which hides the concrete representation) as late as possible in the compiler. In particular, it's important that this representation is used in unfoldings in interface files, so that constant folding can happen on expressions that get inlined. We finally convert `LitInteger` to a proper core representation of Integer in [compiler/coreSyn/CorePrep.lhs](/trac/ghc/browser/ghc/compiler/coreSyn/CorePrep.lhs), which looks up the Id for `mkInteger` and uses it to build an expression like `mkInteger True [123, 456]` (where the `Bool` represents the sign, and the list of `Int`s are 31 bit chunks of the absolute value from lowest to highest).
|
|
|
|
|
|
>
|
|
|
> However, there is a special case for `Integer`s that are within the range of `Int` when the `integer-gmp` implementation is being used; in that case, we use the `S#` constructor (via `integerGmpSDataCon` in [compiler/prelude/TysWiredIn.lhs](/trac/ghc/browser/ghc/compiler/prelude/TysWiredIn.lhs)) to break the abstraction and directly create the datastructure.
|
|
|
|
|
|
However, there is a special case for `Integer`s that are within the range of `Int` when the `integer-gmp` implementation is being used; in that case, we use the `S#` constructor (via `integerGmpSDataCon` in [compiler/prelude/TysWiredIn.lhs](/trac/ghc/browser/ghc/compiler/prelude/TysWiredIn.lhs)) to break the abstraction and directly create the datastructure.
|
|
|
- **Don't inline integer functions**. Most of the functions in the Integer implementation in the `integer` package are marked `NOINLINE`. For example in `integer-gmp` we have
|
|
|
|
|
|
```wiki
|
|
|
plusInteger :: Integer -> Integer -> Integer
|
|
|
plusInteger (S# i1) (S# i2) = ...
|
|
|
plusInteger (S# i1) (J# j1 j2) = ...
|
|
|
...two more cases...
|
|
|
```
|
|
|
|
|
|
Most of the functions in the Integer implementation are marked `NOINLINE`. This is because inlining them is generally not beneficial (any constant folding is already handled by the built-in rules), and in fact can be harmful: In the GMP representation, each argument can be one of two constructors (`S#` and `J#`), which leads to 2 branches. When you have a number of `Integer` arithmetic operations, you can get an exponential code explosion if they all get inlined. |
|
|
Not only is this a big function to inline, but inlining it typically does no good because the representation of literals is abstact, so no pattern-matching cancellation happens. And even if you have `(a+b+c)`, the conditionals mean that no cancellation happens, or you get an exponential code explosion! |