PrimOps are functions that cannot be implemented in Haskell, and are provided natively by GHC. For example, adding two Int# values is provided as the PrimOp +#, and allocating a new mutable array is the PrimOp newArray#.
PrimOps are made available to Haskell code through the virtual module GHC.Prim. This module has no implementation, and its interface never resides on disk: if GHC.Prim is imported, we use a built-in ModIface value - see ghcPrimIface in compiler/iface/LoadIface.hs.
libraries/base/GHC/PrimopWrappers.hs, a file that contains (curried) wrapper
functions for each of the PrimOps, so that they are accessible from byte-code, and
so that the byte-code interpreter doesn't need to implement any PrimOps at all: it
just invokes the compiled ones from GHC.PrimopWrappers.
libraries/ghc-prim/dist-install/build/autogen/GHC/Prim.hs, a generated source file containing dummy declarations for
all the PrimOps, solely so that Haddock can include documentation for GHC.Prim
in its documentation for the base package. The file GHC/Prim.hs is never
actually compiled, only processed by Haddock.
Note that if you want to create a polymorphic primop, you need to return (# a #), not a.
Implementation of PrimOps
PrimOps are divided into two categories for the purposes of implementation: inline and out-of-line.
Inline PrimOps are operations that can be compiled into a short sequence of code that never needs to allocate, block, or return to the scheduler for any reason. An inline PrimOp is compiled directly into Cmm by the code generator. The code for doing this is in compiler/codeGen/StgCmmPrim.hs.
All other PrimOps are classified as out-of-line, and are implemented by hand-written C-- code in the file rts/PrimOps.cmm. An out-of-line PrimOp is like a Haskell function, except that
PrimOps cannot be partially applied. Calls to all PrimOps are made at the correct arity; this is ensured by
the CorePrep pass.
Out-of-line PrimOps have a special, fixed, calling convention:
are in the registers R1-R8. This is to make it easy to write the
C-- code for these PrimOps: we don't have to write code for multiple calling conventions.
It's possible to provide inline versions of out-of-line PrimOps. This is useful when we have enough static information to generated a short, more efficient inline version. For example, a call to newArray# 8# init can be implemented more efficiently as an inline PrimOp as the heap check for the array allocation can be combined with the heap check for the surrounding code. See shouldInlinePrimOp in compiler/codeGen/StgCmmPrim.hs.
Foreign out-of-line PrimOps and foreign import prim
A new and somewhat more flexible form of out-of-line PrimOp is the foreign out-of-line PrimOp. These are essentially the same but instead of their Cmm code being included in the RTS, they can be defined in Cmm code in any package and instead of knowledge of the PrimOp being baked into the compiler, they can be imported using special FFI syntax:
The string (e.g. "int2Integerzh") is the linker name of the Cmm function. Using this syntax requires the extensions ForeignFunctionInterface, GHCForeignImportPrim, MagicHash, UnboxedTuples and UnliftedFFITypes. The current type restriction is that all arguments and results must be unlifted types, with two additional possibilities: An argument may (since GHC 7.5) be of type Any (in which case the called function will receive a pointer to the heap), and the result type is allowed to be an unboxed tuple. The calling convention is exactly the same as for ordinary out-of-line primops. Currently it is not possible to specify any of the PrimOp attributes.
The integer-gmp package now uses this method for all the primops that deal with GMP big integer values. The advantage of using this technique is that it is a bit more modular. The RTS does not need to include all the primops. For example in the integer case the RTS no longer needs to link against the GMP C library.
The first step is to extend this syntax to allow PrimOp attributes to be specified. Then external-only (some PrimOps have external fallbacks and internal special-cases!) primops can be converted to use foreign import prim, bypassing the Compiler for interactions strictly between base and the rts.
Another direction is the calling convention for primops and ordinary compiled Haskell functions may be unified in future and at that time it the restriction on using only unlifted types may be lifted. [This sounds weird to @Ericson2314. Aren't primops definitionally inlined? Both they and foreign calls have wrappers that deal with lifted types. I'd want be sure when writing a primop whether I was getting an implicit wrapper, otherwise it's hard to reason about the cost and ABI.]
It has been suggested that we extend this PrimOp definition and import method to cover all PrimOps, even inline ones. This would replace the current primops.txt.pp system of builtin PrimOps. The inline PrimOps would still be defined in the compiler but they would be imported in any module via foreign import prim rather than appearing magically to be exported from the GHC.Prim module. Hugs has used a similar system for years (with the syntax primitive seq :: a -> b -> b).
Adding a new PrimOp
To add a new primop, you currently need to update the following files:
for a foreign out-of-line primop You do not need to modify the rts or compiler at all.
yourpackage/cbits/primops.cmm: implement your primops here. You have to arrange for the .cmm file to be compiled and linked into the package. The GHC build system has support for this. Cabal does not yet.
yourpackage/TheCode.hs: use foreign import prim to import the primops.
Note: In case of compile errors like
Not in scope: ‘GHC.Prim.<newPrimOp>’
run make clean so the build system will pick up changes to the generated GHC.Prim module.
In addition, if new primtypes are being added, the following files need to be updated:
compiler/prelude/TysPrim.hs -- there are a raft of changes here; you need to create *PrimTy, *PrimTyCon and *PrimTyConName variables. The most important thing to make sure you get right is when you make a PrimTyCon, you pick the correct PrimRep for your type. For example, if you’ve introduced a new GC'able object, you should use PtrRep; however, if it's a pointer that shouldn't be GC'd, you should use AddrRep instead. The full list is in compiler/types/TyCon.hs
See also AddingNewPrimitiveOperations, a blow-by-blow description of the process for adding a new out-of-line primop from someone who went through the process.
Explanation of attributes
TBV (To be verified)
default = False
default = False
Set to True if there is a function in PrimOps.cmm. This also changes to code generator to push the continuation
of any follow on code onto the stack.
default = False
default = False
default = [lazyDmd, ... ] TopRes
This is the strictness of the PrimOp. Unboxed things should be marked as lazyDmd. (Someone explain why?).