Libraries/Integer
GHC Commentary:GHC is set up to allow different implementations of the Integer
type to be chosen at build time.
Selecting an Integer implementation
You can select which implementation of Integer is used by defining INTEGER_LIBRARY
in mk/build.mk
. This tells the build system to build the library in libraries/$(INTEGER_LIBRARY)
, and the cIntegerLibrary
and cIntegerLibraryType
values in Config.hs
are defined accordingly.
The default value is integer-gmp
, which uses the GNU Multiple Precision Arithmetic Library (GMP) to define the Integer type and its operations.
The other implementation currently available is integer-simple
, which uses a simple (but slow, for larger Integers) pure Haskell implementation.
The Integer interface
All Integer implementations should export the same set of types and functions from GHC.Integer
(within whatever integer
package you are using). These exports are used by the base
package. However, all of these types and functions must actually be defined in GHC.Integer.Type
, so that GHC knows where to find them.
Specifically, the interface is this:
data Integer
mkInteger :: Bool -- True <=> non-negative
-> [Int] -- Absolute value in 31 bit chunks, least significant first
-- ideally these would be Words rather than Ints, but
-- we don't have Word available at the moment. (why?)
-> Integer
smallInteger :: Int# -> Integer
integerToInt :: Integer -> Int#
wordToInteger :: Word# -> Integer
integerToWord :: Integer -> Word#
-- And similarly for Int64#, Word64# on 64-bit
floatFromInteger :: Integer -> Float#
decodeFloatInteger :: Float# -> (# Integer, Int# #)
encodeFloatInteger :: Integer -> Int# -> Float#
-- And similarly Double
plusInteger :: Integer -> Integer -> Integer
-- And similarly: minusInteger, timesInteger, negateInteger,
-- eqInteger, neqInteger, absInteger, signumInteger,
-- leInteger, gtInteger, ltInteger, geInteger, compareInteger,
-- divModInteger, quotRemInteger, quotInteger, remInteger,
-- andInteger, orInteger, xorInteger, complementInteger,
-- shiftLInteger, shiftRInteger,
-- hashInteger,
How Integer is handled inside GHC
-
Front end. Integers are represented using the
HsInteger
constructor ofHsLit
for the early phases of compilation (e.g. type checking) -
Core. In
Core
representation, an integer literal is represented by theLitInteger
constructor of theLiteral
type.data Literal = ... | LitInteger Integer Type
While
Integer
s aren't "machine literals" like the otherCore
Literal
constructors, it is more convenient when writing constant folding RULES to pretend that they are literals rather than having to understand their concrete representation. (Especially as the concrete representation varies from package to package.) We also carry around aType
, representing theInteger
type, in the constructor, as we need access to it in a few functions (e.g.literalType
). -
Constant folding. There are many constant-folding optimisations for
Integer
expressed as built-in rules in compiler/GHC/Core/Opt/ConstantFold.hs; look atbuiltinIntegerRules
. All of the types and functions in theInteger
interface have built-in names, e.g.plusIntegerName
, defined in compiler/GHC/Core/Opt/ConstantFold.hs and included inbasicKnownKeyNames
. This allows us to match on all of the functions inbuiltinIntegerRules
in compiler/GHC/Core/Opt/ConstantFold.hs, so we can constant-fold Integer expressions. An important thing about constant folding of Integer divisions is that they depend on inlining. Here's a fragment ofIntegral Integer
instance definition fromlibraries/base/GHC/Real.lhs
:instance Integral Integer where toInteger n = n {-# INLINE quot #-} _ `quot` 0 = divZeroError n `quot` d = n `quotInteger` d
Constant folding rules for divisions are defined for
quotInteger
and other division functions frominteger-gmp
library. Ifquot
was not inlined constant folding rules would not fire. The rules would also not fire if call toquotInteger
was inlined, but this does not happen because it is marked with NOINLINE pragma - see below. -
Converting between Int and Integer. It's quite commonly the case that, after some inlining, we get something like
integerToInt (intToInteger i)
, which converts anInt
to anInteger
and back. This must optimise away (see #5767 (closed)). We do this by requiring that theinteger
package exposessmallInteger :: Int# -> Integer
Now we can define
intToInteger
(or, more precisely, thetoInteger
method of theIntegral Int
instance inGHC.Real
) thustoInteger (I# i) = smallInteger i
And we have a RULE for
integerToInt (smallInteger i)
. -
Representing integers. We stick to the
LitInteger
representation (which hides the concrete representation) as late as possible in the compiler. In particular, it's important that theLitInteger
representation is used in unfoldings in interface files, so that constant folding can happen on expressions that get inlined.We finally convert
LitInteger
to a proper core representation of Integer in compiler/GHC/CoreToStg/Prep.hs, which looks up the Id formkInteger
and uses it to build an expression likemkInteger True [123, 456]
(where theBool
represents the sign, and the list ofInt
s are 31 bit chunks of the absolute value from lowest to highest).However, there is a special case for
Integer
s that are within the range ofInt
when theinteger-gmp
implementation is being used; in that case, we use theS#
constructor (viaintegerGmpSDataCon
in compiler/GHC/Builtin/Types.hs) to break the abstraction and directly create the datastructure. -
Don't inline integer functions. Most of the functions in the Integer implementation in the
integer
package are markedNOINLINE
. For example ininteger-gmp
we haveplusInteger :: Integer -> Integer -> Integer plusInteger (S# i1) (S# i2) = ... plusInteger (S# i1) (J# j1 j2) = ... -- ...two more cases...
Not only is this a big function to inline, but inlining it typically does no good because the representation of literals is abstact, so no pattern-matching cancellation happens. And even if you have
(a+b+c)
, the conditionals mean that no cancellation happens, or you get an exponential code explosion!