Matthew Pickering requested to merge wip/ghc-fat-interface into master Feb 07, 2022

Motivation for Fat Interface Files

The goal for a fat interface file is to be able to restart the compiler pipeline at the point just after simplification and before code generation. Once compilation is restarted then code can be created for any backend. In particular, we wanted to be able to generate bytecode on demand for any module, as this can significantly speed up start-times for projects in GHCi. HLS already implements its own version of fat interface files for this reason.

In the future, we want to use fat interface files for a more robust implementation strategy for Typed Template Haskell. In this implementation, the backend will have to preserve type information at runtime.
Fat interface files can also be used to defer the choice of backend until more is known about the necessary targets for a project. In particular, Cabal pessimises build times by building both static and dynamic objects under the assumption that you will eventually need dynamic objects to run TH splices. With fat interface files, we can delay this choice until we know for sure we need to do the work.
Finally, fat interface files can also be useful for program analysis tasks which need to operate on the whole program. The external STG interpreter could read fat interface files and convert the result into its own STG format for running on the STG interpreter.

In short, fat interface files give us much more flexibility when targeting different backends than we have had before.

Fat Interface File Basics

A fat interface file is an extension of a normal interface (.hi file). The compiler writes .hi files to communicate the interface of modules between each other. All the information which is needed to compile against a module is contained within the interface file, so in order to resolve an import the compiler looks for the corresponding interface file for module.

A fat interface file extends an interface file with a new field which contains complete Core bindings for the modules.

-fwrite-fat-interface

: Include the whole program definition in the interface file.

If you compile a module with -fwrite-fat-interface then you will see a new section called "extra-decls" when you dump the contents of an interface file with --show-iface. This section of the interface contains all the Core bindings of the program.

> _build/stage1/bin/ghc --show-iface FAT.hi
....
extra-decls
f = GHC.Types.C# 'f'#
a = GHC.Types.C# 'a'#
t = GHC.Types.C# 't'#
....

The serialised program is a Core program. Using the Core representation is convenient for a number of reasons:

We already have the ability to serialise Core.
Constructing bytecode from Core is not a very expensive operation.
Other backends can translate the Core into their own representations.

We serialise the program after simplification. This means that the fat interface file for a module compiled without optimisations will contain unoptimised bindings, whereas the fat interface file for an optimised module will contain optimised bindings.

Using bytecode for Template Haskell evaluation

GHC always uses the bytecode interpreter to interpret a Template Haskell splice for the current module. On the other hand, dependent home-package modules can be handled in two different ways:

Object files: link the object files together using the system linker, and pass the resulting library to the interpreter,
Bytecode: directly load the already-compiled bytecode into the interpreter.

GHC in --make mode uses the former method, whereas GHCi uses the latter.

When passed the new -fprefer-byte-code flag, GHC will use the bytecode interpreter whenever bytecode is available (including in --make mode).

-fprefer-byte-code

: Use bytecode rather than object files for module dependencies when evaluating Template Haskell splices. This flag affects the decision we make about which linkable to use at the splice site only. It doesn't have any effect on which linkables are generated from a module.

In addition, if you prefer bytecode, then the compiler will automatically turn
on bytecode generation if it needs code generation when using `-fno-code`.

There are a couple of reasons why you might want to use this flag:

Producing object code is much slower than producing bytecode, and normally you need to compile with -dynamic-too to produce code in the static and dynamic way, the dynamic way just for Template Haskell execution when using a dynamically linked compiler.
Linking many large object files, which happens once per splice, can be quite expensive compared to linking bytecode.

In order to generate both the byte code and object file linkables, there is a separate flag -fbyte-code-and-object-code.

-fbyte-code-and-object-code

: Produce both byte code and object code for a module. This flag implies -fwrite-fat-interface.

You probably want to use these three flags (-fwrite-fat-interface, -fprefer-byte-code and -fbyte-code-and-object-code) together. If you are using -fbyte-code-and-object-code without -fwrite-fat-interface then you will recompile your project from scratch each time (due to lacking the fat interface section). Likewise, if you are using -fbyte-code-and-object-code without -fprefer-byte-code, then the bytecode which you generate will never be used. This may not be an issue (as the bytecode is generated lazily), but it's something to keep in mind.

Compare -fbyte-code-and-object-code with the existing -fobject-code and -fbyte-code flags, which don't allow a combination:

-fobject-code

: Produce object code for a module. This flag turns off -fbyte-code-and-object-code so using -fobject-code in an OPTIONS_GHC pragma will ensure that byte code is never produced or used for a module.

-fbyte-code

: Produce byte code for a module. This flag turns off -fbyte-code-and-object-code so using -fbyte-code means to only produce byte code for a module.

Recompilation and Fat Interface Files

When using -fbyte-code-and-object-code, the recompilation checker checks for the presence of a fat interface file, recompiling the module if one doesn't exist.

If -fbyte-code-and-object-code is not enabled then even if you have a fat interface file the byte code isn't loaded for a module. This prevents the situation where you first compile a fat interface for module A and then later recompile it with -fobject-code, then you don't want to make the byte-code available for later modules if they use -fprefer-byte-code.

Conclusion

Fat interface files are quite a simple but powerful feature. To be maximally effective, more work is necessary in the ecosystem to use them when appropriate to restart compilation, but this contribution makes the important first steps.

Edited Oct 01, 2022 by Matthew Pickering

Fat Interface Files (#21067)

Motivation for Fat Interface Files

Fat Interface File Basics

Using bytecode for Template Haskell evaluation

Recompilation and Fat Interface Files

Conclusion

Merge request reports