Make Backpack order-independent (again)

When we moved to the new bkp file format, we also went back to the a format which is order-dependent: that is to say, the order in which you put the declarations matters. So if you write:

unit p where
  module A where
    import B
  module B where
    ...

this fails to type-check, GHC complaining that B is not in scope. I did this, in part because it's what the Backpack paper described, and because it was "simpler" to implement.

I think we should move back to an order-independent scheme, for the following reasons:

Haskell users are used to not needing pay particularly close attention to the ordering of their modules, and forcing people to linearize their module descriptions would be spectacularly disruptive with large amounts of modules. So un-ordered modules are "more natural for a traditional Haskell user.
Order-independence imposes some constraints on how expressive programs are (with order-dependent Backpack, you can do some pretty tricky things by ordering things certain ways); this could simplify some aspects of compiler implementation and make Backpack easier to explain.
A particular case of (2): it seems a lot simpler UX-wise to let a user assume that if you import a module M in a unit, it doesn't matter where you import it: you always get the same set of identifiers brought into scope. Thus, the incremental results of signatures should not be visible, c.f. #10679 (closed)

The main idea is that only the surface-syntax is un-ordered: the internal representation of units is a DAG which we work out in an elaboration phase, not altogether unsimilar from what GhcMake computes. An important auxiliary idea is that import A where A is backed by some signatures depends on EVERY signature in scope.

Here are the details:

*The intermediate representation.** We translate into an intermediate representation which consists of a directed graph of:

• Each source-level module, signature and include, and • Each unfilled requirement (called a “signature merge” node).

The edges of the directed graph signify a “depends on” relation, and are defined as follows:

• An include p depends on include q if, for some module name m, p requires m and q provides m. • An include p depends on a module m if p requires a module named m. • A module/signature m depends on include p if m imports a module provided by p. • A module/signature m depends on a module n if m imports n. • A module/signature m depends on a signature merge n if m imports n. • A module/signature m depends on a signature n if m {-# SOURCE #-} imports n. • A signature merge m depends on a local signature m (if it exists). • A signature merge m depends on a include p, if the (renamed) include requires m.

*Elaboration.** Take a Backpack file, construct this graph, and topsort it into a DAG of SCCs. SCCs with a single node are compileable as before. SCCs with multiple nodes will have to be managed with some mutual recursion mechanism; see refinements for more thoughts on this.
*Refinements:**

**Can a signature depend on a (home) module?** Imports of this kind require a retypecheck loop. Consider this situation:

unit p where
  signature H where
    data T
  module M where
    import H
    data S = S T
unit q where
  include p
  module Q where
    import M
  signature H where
    import Q
    data T = T S

Here, signature H in q depends on Q. When we typecheck Q, we bring M.S into the type environment with a TyThing that describes the constructor as accepting an abstract type T. However, when we subsequently typecheck the local signature H, we must refine all TyThings of T with the true description (e.g. constructor information). So you'll need to retypecheck Q (and M) in order to make sure the TyThing is correct.

**Can an include depend on a (home) module?** If the module has no (transitive) dependency on signatures, this is fine. However, it's easy to have a circular dependency. Consider:

unit p where
  signature A -- imports nothing
  signature B -- imports nothing
  module M
unit q where
  include p
  module B where
    import A
    ...

B depends on p for p/A.hsig; however, p depends on B because this module is filling a requirement. However, if we were to include the internal graph of p into q, the resulting graph would not have an cycles; so this is one possibility of how to untangle this situation. However, if there's still a cycle (e.g. A imports B), then you will need at least a retypecheck loop, and maybe hs-boot style compilation. We're not going to implement this for now.

**Can we deal with include-include dependency cycles?** Yes! Just use the Backpack paper's strategy for creating a recursive unit key and compile the two packages hs-boot style. But I'm not planning on implementing this yet.
**Can we deal with signature-signature dependency cycles?** Ordered Backpack would have supported this:

unit a-sig where
  signature A where
    data T
unit ab-sig where
  include a-sig
  signature B where
    import A
    data S = S T
  signature A where
    import B
    data T = T S

In our model, ab-sig has a cycle. However, I believe any such cycle can be broken by creating sufficiently many units:

unit a-sig where
  signature B where
    data T
  signature A where
    data S = S T
unit b-sig where
  signature A where
    data S
  signature B where
    data T = T S
unit ab-sig where
  include a-sig
  include b-sig

In principle, GHC could automatically break import cycles by replacing an import with an import of a reduced signature that simply has abstract type definitions. See #10681. (I'm not sure this is possible for all language features.) This technique would also work for normal modules, assuming that every function is explicitly annotated with a type.

Edited Mar 10, 2019 by Edward Z. Yang

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Make Backpack order-independent (again)