Skip to content

Generalize hi-boot/hi for signatures, to manage intermediate merged interfaces

The problem

In some situations, we need to output multiple interface files for what is morally the same module name.

Example 1: Merging external and home signatures

unit a-sig where
    signature A
unit p where
    include a-sig
    signature A

Compiling p/A.hsig produces an interface file which contains just the definitions declared in p. However, someone including p should see the merge of the interface of p/A.hsig AND a-sig/A.hsig (which was included.)

Example 2: Merging two home signatures

unit p where
    signature A
    signature B where
        import A
        ...
    signature A where
        import B
        ...

What should we do if a signature is specified multiple times in the same unit? The compilation of each produces a distinct interface, and the public interface we want to expose is the merge of the two. (And by the way, what's the source file name of A, if we are not using the inline syntax?)

Example 3: Merging a signature and a module

unit p where
    signature A
    module B where
        import A
        ...
    module A where
        import B
        ...

A and B are mutually recursive, and we want to use a signature file to break the gap. The signature produces an interface file, only to be overwritten when we actually define the module proper.

But wait! We have a solution for this example already: the first interface file for A is not saved to A.hi, but A.hi-boot...

The proposal

I want to take the A.hi-boot versus A.hi distinction and generalize it: we should be able to name intermediate interface files A.1.hi, A.2.hi, ... and finally A.hi (which is publically visible outside the unit.) This naming convention applies to Haskell files too.

User-visible consequences

Every signature file is numbered, and every import of a signature file refers to a specific number. This number is unique among all other modules in a unit which share the same name. For backwards compatibility, some number/file name extensions are treated specially:

  1. .hs files compile to .hi (implicitly numbered 0)
  2. .hs-boot files compile to .hi-boot (implicitly numbered 1)
  3. .hsig files compile to .hi-boot (implicitly numbered 1)
  4. .n.hsig files compile to .n.hi-boot (numbered n, where n is greater than 1)
  • *Flex point:** We could give .hsig files their own file extension

for interface files; just would require some more work to distinguish between hs-boot and hsig as well as record the numbering.

To import, the {-# SOURCE n #-} pragma can be used (with {-# SOURCE #-} being equivalent {-# SOURCE 1 #-}.)

Inline Backpack files can omit numbering, since we can figure it out based on the ordering of declarations (numbering in REVERSE order of occurrence). Example 2 can be numbered as follows:

    signature {-# SOURCE 2 #-} A
    signature {-# SOURCE 1 #-} B where
        import {-# SOURCE 2 #-} A
        ...
    signature {-# SOURCE 1 #-} A where
        import {-# SOURCE 1 #-} B
        ...

Internal consequences

In many places in the code today, we record a boolean indicating if we depended on the boot interface hi-boot or the normal interface hi. We now replace this marker with an integer which records the numbering. The primary affected components are dependency recording in interfaces, interface loading code in GHC, and the implementation of --make.

Interaction with signature merging

Unlike hs-boot files, hsig files can be included from external units, in which case the semantics are that all signatures in scope are merged together. The key rule is that we **generate an hi file for each partial merge**; this means that whenever we want to typecheck a module, there is exactly one interface file per module we import. Consider this example:

unit a-sig where
    signature A
unit a-sig2 where
    signature A
unit p where
    include a-sig
    module B
    include a-sig2
    module C
    signature A
    module D

When compiling this, we generate four interface files for A:

unit p where
    include a-sig
    -- Produces A.3.hi-boot (a-sig)
    module B -- uses A.3.hi-boot
    include a-sig2
    -- Produces A.2.hi-boot (a-sig + a-sig2)
    module C -- uses A.2.hi-boot
    signature A
    -- Produces A.hi-boot (everything)
    module D -- uses A.hi-boot
    -- At the end, A.hi-boot copied to A.hi to be publically visible

Can we do anything simpler?

There are a few barriers to doing something simpler:

  1. We can avoid generating extra interface files if we instead merge them on-the-fly when we use them. However, this forces later instances of GHC to do repeated work remerging interface files, so it seems desirable from a performance perspective to merge before writing. Another scheme is that we could merge on use for signatures in the home package, and then write out a unified file at the very end, trading off performance for less written interface files.
  2. The Backpack language is defined in a way that allows modules, signatures and includes to be ordered in a semantically meaningful way. For example:
unit q where
    signature M
    signature A where
        f :: Int -> Int
    ...
unit p where
    signature A where
        data T
    module M where
        import A -- should get T but not f
        ...
    include q -- fill in M
    module S where
        import A -- should get T and f

This means that even within a unit, the interface of a signature file may differ. We could rule this out, but we would have to work out how to explain this limitation to users. (For example, we could solve the example above by saying that units which define modules do not bring their signatures into scope for a package which imports them; but this is a pretty ad hoc rule! And you still have to deal with repeated signatures, or a signature importing a module importing a signature. There are a lot of cases.)

  1. This problem cannot be avoided at all if you are truly doing recursive modules, since you need the intermediate interface file to do compilation at all prior to getting the real implementation.
Trac metadata
Trac field Value
Version 7.11
Type Task
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Package system
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information