Make Backpack order-independent (again)
When we moved to the new bkp
file format, we also went back to the a format which is order-dependent: that is to say, the order in which you put the declarations matters. So if you write:
unit p where
module A where
import B
module B where
...
this fails to type-check, GHC complaining that B
is not in scope. I did this, in part because it's what the Backpack paper described, and because it was "simpler" to implement.
I think we should move back to an order-independent scheme, for the following reasons:
- Haskell users are used to not needing pay particularly close attention to the ordering of their modules, and forcing people to linearize their module descriptions would be spectacularly disruptive with large amounts of modules. So un-ordered modules are "more natural for a traditional Haskell user.
- Order-independence imposes some constraints on how expressive programs are (with order-dependent Backpack, you can do some pretty tricky things by ordering things certain ways); this could simplify some aspects of compiler implementation and make Backpack easier to explain.
- A particular case of (2): it seems a lot simpler UX-wise to let a user assume that if you import a module
M
in a unit, it doesn't matter where you import it: you always get the same set of identifiers brought into scope. Thus, the incremental results of signatures should not be visible, c.f. #10679 (closed)
The main idea is that only the surface-syntax is un-ordered: the internal representation of units is a DAG which we work out in an elaboration phase, not altogether unsimilar from what GhcMake
computes. An important auxiliary idea is that import A
where A
is backed by some signatures depends on EVERY signature in scope.
Here are the details:
- *The intermediate representation.** We translate into an intermediate representation which consists of a directed graph of:
• Each source-level module, signature and include, and • Each unfilled requirement (called a “signature merge” node).
The edges of the directed graph signify a “depends on” relation, and are defined as follows:
• An include p depends on include q if, for some module name m, p requires m and q provides m. • An include p depends on a module m if p requires a module named m. • A module/signature m depends on include p if m imports a module provided by p. • A module/signature m depends on a module n if m imports n. • A module/signature m depends on a signature merge n if m imports n. • A module/signature m depends on a signature n if m {-# SOURCE #-} imports n. • A signature merge m depends on a local signature m (if it exists). • A signature merge m depends on a include p, if the (renamed) include requires m.
- *Elaboration.** Take a Backpack file, construct this graph, and topsort it into a DAG of SCCs. SCCs with a single node are compileable as before. SCCs with multiple nodes will have to be managed with some mutual recursion mechanism; see refinements for more thoughts on this.
- *Refinements:**
- **Can a signature depend on a (home) module?** Imports of this kind require a retypecheck loop. Consider this situation:
unit p where
signature H where
data T
module M where
import H
data S = S T
unit q where
include p
module Q where
import M
signature H where
import Q
data T = T S
Here, signature H in q depends on Q. When we typecheck
Q
, we bringM.S
into the type environment with aTyThing
that describes the constructor as accepting an abstract typeT
. However, when we subsequently typecheck the local signatureH
, we must refine allTyThing
s ofT
with the true description (e.g. constructor information). So you'll need to retypecheckQ
(andM
) in order to make sure theTyThing
is correct.
- **Can an include depend on a (home) module?** If the module has no (transitive) dependency on signatures, this is fine. However, it's easy to have a circular dependency. Consider:
unit p where
signature A -- imports nothing
signature B -- imports nothing
module M
unit q where
include p
module B where
import A
...
B
depends onp
forp/A.hsig
; however,p
depends onB
because this module is filling a requirement. However, if we were to include the internal graph ofp
intoq
, the resulting graph would not have an cycles; so this is one possibility of how to untangle this situation. However, if there's still a cycle (e.g.A
importsB
), then you will need at least a retypecheck loop, and maybehs-boot
style compilation. We're not going to implement this for now.
- **Can we deal with include-include dependency cycles?** Yes! Just use the Backpack paper's strategy for creating a recursive unit key and compile the two packages
hs-boot
style. But I'm not planning on implementing this yet. - **Can we deal with signature-signature dependency cycles?** Ordered Backpack would have supported this:
unit a-sig where
signature A where
data T
unit ab-sig where
include a-sig
signature B where
import A
data S = S T
signature A where
import B
data T = T S
In our model,
ab-sig
has a cycle. However, I believe any such cycle can be broken by creating sufficiently many units:
unit a-sig where
signature B where
data T
signature A where
data S = S T
unit b-sig where
signature A where
data S
signature B where
data T = T S
unit ab-sig where
include a-sig
include b-sig
In principle, GHC could automatically break import cycles by replacing an import with an import of a reduced signature that simply has abstract type definitions. See #10681. (I'm not sure this is possible for all language features.) This technique would also work for normal modules, assuming that every function is explicitly annotated with a type.