Skip to content
  • Matthew Pickering's avatar
    Multiple Home Units · fd42ab5f
    Matthew Pickering authored
    
    
    Multiple home units allows you to load different packages which may depend on
    each other into one GHC session. This will allow both GHCi and HLS to support
    multi component projects more naturally.
    
    Public Interface
    ~~~~~~~~~~~~~~~~
    
    In order to specify multiple units, the -unit @⟨filename⟩ flag
    is given multiple times with a response file containing the arguments for each unit.
    The response file contains a newline separated list of arguments.
    
    ```
    ghc -unit @unitLibCore -unit @unitLib
    ```
    
    where the `unitLibCore` response file contains the normal arguments that cabal would pass to `--make` mode.
    
    ```
    -this-unit-id lib-core-0.1.0.0
    -i
    -isrc
    LibCore.Utils
    LibCore.Types
    ```
    
    The response file for lib, can specify a dependency on lib-core, so then modules in lib can use modules from lib-core.
    
    ```
    -this-unit-id lib-0.1.0.0
    -package-id lib-core-0.1.0.0
    -i
    -isrc
    Lib.Parse
    Lib.Render
    ```
    
    Then when the compiler starts in --make mode it will compile both units lib and lib-core.
    
    There is also very basic support for multiple home units in GHCi, at the
    moment you can start a GHCi session with multiple units but only the
    :reload is supported. Most commands in GHCi assume a single home unit,
    and so it is additional work to work out how to modify the interface to
    support multiple loaded home units.
    
    Options used when working with Multiple Home Units
    
    There are a few extra flags which have been introduced specifically for
    working with multiple home units. The flags allow a home unit to pretend
    it’s more like an installed package, for example, specifying the package
    name, module visibility and reexported modules.
    
    -working-dir ⟨dir⟩
    
        It is common to assume that a package is compiled in the directory
        where its cabal file resides. Thus, all paths used in the compiler
        are assumed to be relative to this directory. When there are
        multiple home units the compiler is often not operating in the
        standard directory and instead where the cabal.project file is
        located. In this case the -working-dir option can be passed which
        specifies the path from the current directory to the directory the
        unit assumes to be it’s root, normally the directory which contains
        the cabal file.
    
        When the flag is passed, any relative paths used by the compiler are
        offset by the working directory. Notably this includes -i and
        -I⟨dir⟩ flags.
    
    -this-package-name ⟨name⟩
    
        This flag papers over the awkward interaction of the PackageImports
        and multiple home units. When using PackageImports you can specify
        the name of the package in an import to disambiguate between modules
        which appear in multiple packages with the same name.
    
        This flag allows a home unit to be given a package name so that you
        can also disambiguate between multiple home units which provide
        modules with the same name.
    
    -hidden-module ⟨module name⟩
    
        This flag can be supplied multiple times in order to specify which
        modules in a home unit should not be visible outside of the unit it
        belongs to.
    
        The main use of this flag is to be able to recreate the difference
        between an exposed and hidden module for installed packages.
    
    -reexported-module ⟨module name⟩
    
        This flag can be supplied multiple times in order to specify which
        modules are not defined in a unit but should be reexported. The
        effect is that other units will see this module as if it was defined
        in this unit.
    
        The use of this flag is to be able to replicate the reexported
        modules feature of packages with multiple home units.
    
    Offsetting Paths in Template Haskell splices
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    When using Template Haskell to embed files into your program,
    traditionally the paths have been interpreted relative to the directory
    where the .cabal file resides. This causes problems for multiple home
    units as we are compiling many different libraries at once which have
    .cabal files in different directories.
    
    For this purpose we have introduced a way to query the value of the
    -working-dir flag to the Template Haskell API. By using this function we
    can implement a makeRelativeToProject function which offsets a path
    which is relative to the original project root by the value of
    -working-dir.
    
    ```
    import Language.Haskell.TH.Syntax ( makeRelativeToProject )
    
    foo = $(makeRelativeToProject "./relative/path" >>= embedFile)
    ```
    
    > If you write a relative path in a Template Haskell splice you should use the makeRelativeToProject function so that your library works correctly with multiple home units.
    
    A similar function already exists in the file-embed library. The
    function in template-haskell implements this function in a more robust
    manner by honouring the -working-dir flag rather than searching the file
    system.
    
    Closure Property for Home Units
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    For tools or libraries using the API there is one very important closure
    property which must be adhered to:
    
    > Any dependency which is not a home unit must not (transitively) depend
      on a home unit.
    
    For example, if you have three packages p, q and r, then if p depends on
    q which depends on r then it is illegal to load both p and r as home
    units but not q, because q is a dependency of the home unit p which
    depends on another home unit r.
    
    If you are using GHC by the command line then this property is checked,
    but if you are using the API then you need to check this property
    yourself. If you get it wrong you will probably get some very confusing
    errors about overlapping instances.
    
    Limitations of Multiple Home Units
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    There are a few limitations of the initial implementation which will be smoothed out on user demand.
    
        * Package thinning/renaming syntax is not supported
        * More complicated reexports/renaming are not yet supported.
        * It’s more common to run into existing linker bugs when loading a
          large number of packages in a session (for example #20674, #20689)
        * Backpack is not yet supported when using multiple home units.
        * Dependency chasing can be quite slow with a large number of
          modules and packages.
        * Loading wired-in packages as home units is currently not supported
          (this only really affects GHC developers attempting to load
          template-haskell).
        * Barely any normal GHCi features are supported, it would be good to
          support enough for ghcid to work correctly.
    
    Despite these limitations, the implementation works already for nearly
    all packages. It has been testing on large dependency closures,
    including the whole of head.hackage which is a total of 4784 modules
    from 452 packages.
    
    Internal Changes
    ~~~~~~~~~~~~~~~~
    
    * The biggest change is that the HomePackageTable is replaced with the
      HomeUnitGraph. The HomeUnitGraph is a map from UnitId to HomeUnitEnv,
      which contains information specific to each home unit.
    * The HomeUnitEnv contains:
        - A unit state, each home unit can have different package db flags
        - A set of dynflags, each home unit can have different flags
        - A HomePackageTable
    * LinkNode: A new node type is added to the ModuleGraph, this is used to
      place the linking step into the build plan so linking can proceed in
      parralel with other packages being built.
    * New invariant: Dependencies of a ModuleGraphNode can be completely
      determined by looking at the value of the node. In order to achieve
      this, downsweep now performs a more complete job of downsweeping and
      then the dependenices are recorded forever in the node rather than
      being computed again from the ModSummary.
    * Some transitive module calculations are rewritten to use the
      ModuleGraph which is more efficient.
    * There is always an active home unit, which simplifies modifying a lot
      of the existing API code which is unit agnostic (for example, in the
      driver).
    
    The road may be bumpy for a little while after this change but the
    basics are well-tested.
    
    One small metric increase, which we accept and also submodule update to
    haddock which removes ExtendedModSummary.
    
    Closes #10827
    
    -------------------------
    Metric Increase:
        MultiLayerModules
    -------------------------
    
    Co-authored-by: default avatarFendor <power.walross@gmail.com>
    fd42ab5f