Make Backpack order-independent (again)
When we moved to the new
bkp file format, we also went back to the a format which is order-dependent: that is to say, the order in which you put the declarations matters. So if you write:
unit p where module A where import B module B where ...
this fails to type-check, GHC complaining that
B is not in scope. I did this, in part because it's what the Backpack paper described, and because it was "simpler" to implement.
I think we should move back to an order-independent scheme, for the following reasons:
- Haskell users are used to not needing pay particularly close attention to the ordering of their modules, and forcing people to linearize their module descriptions would be spectacularly disruptive with large amounts of modules. So un-ordered modules are "more natural for a traditional Haskell user.
- Order-independence imposes some constraints on how expressive programs are (with order-dependent Backpack, you can do some pretty tricky things by ordering things certain ways); this could simplify some aspects of compiler implementation and make Backpack easier to explain.
- A particular case of (2): it seems a lot simpler UX-wise to let a user assume that if you import a module
Min a unit, it doesn't matter where you import it: you always get the same set of identifiers brought into scope. Thus, the incremental results of signatures should not be visible, c.f. #10679 (closed)
The main idea is that only the surface-syntax is un-ordered: the internal representation of units is a DAG which we work out in an elaboration phase, not altogether unsimilar from what
GhcMake computes. An important auxiliary idea is that
import A where
A is backed by some signatures depends on EVERY signature in scope.
Here are the details:
- *The intermediate representation.** We translate into an intermediate representation which consists of a directed graph of:
• Each source-level module, signature and include, and • Each unfilled requirement (called a “signature merge” node).
The edges of the directed graph signify a “depends on” relation, and are defined as follows:
• An include p depends on include q if, for some module name m, p requires m and q provides m. • An include p depends on a module m if p requires a module named m. • A module/signature m depends on include p if m imports a module provided by p. • A module/signature m depends on a module n if m imports n. • A module/signature m depends on a signature merge n if m imports n. • A module/signature m depends on a signature n if m # SOURCE # imports n. • A signature merge m depends on a local signature m (if it exists). • A signature merge m depends on a include p, if the (renamed) include requires m.
- *Elaboration.** Take a Backpack file, construct this graph, and topsort it into a DAG of SCCs. SCCs with a single node are compileable as before. SCCs with multiple nodes will have to be managed with some mutual recursion mechanism; see refinements for more thoughts on this.
- **Can a signature depend on a (home) module?** Imports of this kind require a retypecheck loop. Consider this situation:
unit p where signature H where data T module M where import H data S = S T unit q where include p module Q where import M signature H where import Q data T = T S
Here, signature H in q depends on Q. When we typecheck
Q, we bring
M.Sinto the type environment with a
TyThingthat describes the constructor as accepting an abstract type
T. However, when we subsequently typecheck the local signature
H, we must refine all
Twith the true description (e.g. constructor information). So you'll need to retypecheck
M) in order to make sure the
- **Can an include depend on a (home) module?** If the module has no (transitive) dependency on signatures, this is fine. However, it's easy to have a circular dependency. Consider:
unit p where signature A -- imports nothing signature B -- imports nothing module M unit q where include p module B where import A ...
Bbecause this module is filling a requirement. However, if we were to include the internal graph of
q, the resulting graph would not have an cycles; so this is one possibility of how to untangle this situation. However, if there's still a cycle (e.g.
B), then you will need at least a retypecheck loop, and maybe
hs-bootstyle compilation. We're not going to implement this for now.
- **Can we deal with include-include dependency cycles?** Yes! Just use the Backpack paper's strategy for creating a recursive unit key and compile the two packages
hs-bootstyle. But I'm not planning on implementing this yet.
- **Can we deal with signature-signature dependency cycles?** Ordered Backpack would have supported this:
unit a-sig where signature A where data T unit ab-sig where include a-sig signature B where import A data S = S T signature A where import B data T = T S
In our model,
ab-sighas a cycle. However, I believe any such cycle can be broken by creating sufficiently many units:
unit a-sig where signature B where data T signature A where data S = S T unit b-sig where signature A where data S signature B where data T = T S unit ab-sig where include a-sig include b-sig
In principle, GHC could automatically break import cycles by replacing an import with an import of a reduced signature that simply has abstract type definitions. See #10681. (I'm not sure this is possible for all language features.) This technique would also work for normal modules, assuming that every function is explicitly annotated with a type.