Commit 8800a73a authored by Edward Z. Yang's avatar Edward Z. Yang
Browse files

Backpack: Flesh out more Cabal details


Signed-off-by: default avatarEdward Z. Yang <ezyang@cs.stanford.edu>
parent 1967a52d
......@@ -1291,7 +1291,7 @@ Design goals:
\item Backpack files are user-written. (In an earlier design, we had
the idea that Cabal would generate Backpack files; however, we've
since made Backpack files more user-friendly and reasonable to
write by hand.)
write by hand since they are reasonably designed for user development.)
\item Backpack files are optional. A package can add a Backpack file
to replace some (but not all) of the fields in a Cabal description.
......@@ -1307,70 +1307,86 @@ Design goals:
\subsection{Versioning}
In this section, we discuss how Cabal's version numbers factor into
Backpack, namely how we specify \I{PkgKey}s.
\paragraph{History}
Prior to GHC 7.10, GHC has allowed an arbitrary combination of libraries
to be linked together, assuming that the package IDs (e.g.
\verb|foo-0.1|) were all unique. Cabal enforces a stronger restriction,
which is that there exists some unique mapping from package name to
package version which is consistent with all transitive dependencies.
In this section, we discuss how version numbers from Cabal factor into
Backpack. In particular, versioning impacts the specification of \I{PkgKey}s.
See \url{https://ghc.haskell.org/trac/ghc/wiki/Commentary/Packages/Concepts}
for more background, and \url{https://ghc.haskell.org/trac/ghc/ticket/10566}
for implementation progress.
\paragraph{Design goals}
Here are some design goals for versioning:
\begin{enumerate}
\item GHC only tests for equality on versioning; Cabal is
responsible for determining the version of a package. For example,
pre-7.10 the linker symbols were prefixed using a package name and
version, but GHC simply represented this internally as an opaque
string. As another example, package qualified imports only allow
qualification by package name, and not by version.
\item Cabal only tests for equality on package keys; GHC is
responsible for calculating the package key of a package. (This is
\item GHC doesn't know anything about version numbers: this is Cabal
specific information. There are a few cases in GHC today where
this design goal is already in force: pre-7.10, linker
symbols were prefixed using a package name and version, but GHC
simply represented this internally as an opaque string. And in
today's GHC, package qualified imports only allow qualification by
package name, and not by version.
\item Cabal doesn't know anything about package keys: GHC is
responsible for calculating the package key of a package. This is
because GHC must be able to maintain a mapping between the unhashed
and hashed versions of a key, and the hashing process must be
deterministic.) If Cabal needs to generate a new package key, it
must do so through GHC.
deterministic. If Cabal needs to generate a new package key, it
must do so through GHC. (This is NOT how this is happening in GHC 7.10.)
\item Our design should, in principle, support mutual recursion
between packages, even if the implementation does not (presently).
between packages, even if the implementation does not at the moment.
\item GHC should not lose functionality, i.e. it should still be
possible to link together the same package with different versions;
however, Cabal may arrange for this to not occur by default unless a
user explicitly asks for it.
\item A Cabal source package identifier (e.g. \verb|foo-0.1|), which is
a unit of distribution, is a distinct
concept from a Backpack package (which we have referred to previously
in the document as a mere package name), because a single Cabal file may
ship a Backpack file that defines multiple internal packages.
\end{enumerate}
These goals imply a few things:
\begin{enumerate}
\item Backpack files should not contain any version numbers,
and should be agnostic to versioning.
and should be agnostic to versioning. Backpack files are parsed
and interpreted by GHC, and version numbers are Cabal's provenance!
\item As a corollary, if you want to refer to a specific version of
a package from a Backpack file, this has to be done by giving the
alternate version a different package name, e.g. \verb|network-old|.
(It is tempting to want to simply say that this means we should allow
version numbers into GHC, but consider more complicated situations where
you want to refer to two instances of \verb|foo|, but one compiled
with \verb|bar-0.1| and the other compiled with \verb|bar-0.2|, then
your description of which package to pick up becomes considerably more
complicated than just a package name and version. Better to defer
this decision to Cabal.)
\item Package keys must record versioning information, otherwise
we can't link together two different versions of the same package.
This is due to our backwards-compatibility requirement.
\end{enumerate}
\paragraph{Package keys}
Earlier, we specified \I{PkgKey} as a package name $p$ and then a list
of hole instantiations. To allow linking together multiple versions of
To allow linking together multiple versions of
the same package, we must record versioning information into the
\I{PkgKey}. To do this, we include in the \I{PkgKey} a \I{VersionHash}.
Cabal is responsible for defining \I{VersionHash}, but we give two possible
Cabal is responsible for defining \I{VersionHash} and may do whatever it
wants, but we give two possible
definitions in Figure~\ref{fig:version}.
\begin{figure}[htpb]
$$
\begin{array}{rcll}
p && \mbox{Package name} \\
v && \mbox{Version number} \\[1em]
\I{VersionHash} & ::= & p \verb|-| v\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\
\I{VersionHash'} & ::= & p \; \verb|{| \, p_0\verb|-|v_0 \verb|,|\, \ldots\, p_n\verb|-|v_n \, \verb|}| & \mbox{Simplified version hash} \\
\I{PkgKey} & ::= & \I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\
\I{SrcPkgId} && \mbox{Cabal source package ID, e.g. } \verb|foo-0.1| \\[1em]
\I{VersionHash} & ::= & \I{SrcPkgId}\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\
\I{VersionHash'} & ::= & \I{SrcPkgId} \; \verb|{| \, \I{SrcPkgId}_0 \verb|,|\, \ldots\, \verb|,|\, \I{SrcPkgId}_n \, \verb|}| & \mbox{Simplified version hash} \\
\I{PkgKey} & ::= & p\verb|-|\I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\
\end{array}
$$
\caption{Version hash} \label{fig:version}
......@@ -1387,7 +1403,7 @@ The full version hash has some subtleties:
\begin{itemize}
\item Each sub-\I{VersionHash} recorded in a \I{VersionHash} is
identified by a package name, which may not necessarily equal the
package name in the \I{VersionHash}. This permits us to calculate
package name embedded in the \I{SrcPkgId} in the \I{VersionHash}. This permits us to calculate
a \I{VersionHash} for a package like:
\begin{verbatim}
package p where
......@@ -1397,15 +1413,15 @@ The full version hash has some subtleties:
\end{verbatim}
if we want \verb|network| to refer to \verb|network-2.0| and
\verb|network-old| to refer to \verb|network-1.0|. Without
identifying each subdependency by package name, we wouldn't know
what \verb|network-old| would refer to.
identifying each subdependency by package name, we could not
distinguish the recorded \I{VersionHash}s for \verb|network-old| and \verb|network|.
\item If a package is locally specified in a Backpack
file, it does not occur in the \I{VersionHash}. This is because
we always refer to the same package; there are no different versions!
\item If a package name is locally specified in a Backpack
file, it does not occur in the \I{VersionHash}: \I{VersionHash}
strictly operates over Cabal's notion of package identity.
\item You might wonder why we need a \I{VersionHash} as well as a \I{PkgKey};
why not just specify \I{PkgKey} as $p-v \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$? However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion:
why not just specify \I{PkgKey} as $\I{SrcPkgId} \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$? However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion:
\begin{verbatim}
package p where
......@@ -1419,15 +1435,235 @@ The full version hash has some subtleties:
version hash does not have this problem as it is not recursive.)
\end{itemize}
\paragraph{Cabal to GHC}
\subsection{Distribution and installation}
How are Backpack files installed so other people can use them?
\paragraph{Challenges}
\begin{itemize}
\item Prior to Backpack, when a Cabal package (e.g. unit of
distribution) was compiled and installed would result in a single
entry in the installed package database. With Backpack, compiling a
package could result in multiple entries in the installed package
database: (1) for indefinite packages which were instantiated, and
(2) when there are multiple packages in a Backpack file.
\item Relatedly, when we include an indefinite package, we may need
to rebuild it with our specific dependencies. This makes compiling
a Backpack file much more similar to \verb|cabal-install| than to
\verb|Cabal|; however, the dependency structure is something that
only GHC can calculate.
\end{itemize}
\paragraph{Why distribute Backpack files?}
Backpack files offer a convenient mechanism of defining multiple packages
with inline syntax for modules. Further syntax extensions could allow us
to give people a MixML style of programming in Haskell.
A Backpack file is not a replacement for a Cabal file:
\verb|exposed-modules| and similar fields are not necessary but we still
need a \verb|build-depends| to provide version bounds (until Backpack
can also be used to handle version dependency.) This makes it easy
for cabal-install to do its job.
This means we distinguish a package name $p$ which occurs in a Backpack
file and a Cabal \I{SrcPkgId}: Cabal creates a mapping between these.
So to refer to an old version of a package, you would refer to it with
a different name $q$, and then tell Cabal about the version bound constraints
you want.
\paragraph{Definite packages}
Suppose we have written a Backpack file that looks like:
\begin{verbatim}
package helper where
include base
module P
package mypackage where
include containers
include helper
module Q
\end{verbatim}
and have written a Cabal file for it intending to distribute it on
Hackage under the name \verb|mypackage-0.1|. In the end, we will end
up with the following entries in our installed package database:
\begin{verbatim}
name: "mypackage"
id: mypackage-1.0-IPID
version: 1.0
key: XXX
# e.g. mypackage-AAA {}
version-hash: AAA
# e.g. mypackage-1.0 { base -> base-4.7 , containers -> containers-0.5 }
depends: mypackage$helper-1.0-IPID, base-4.7-IPID
---
name: "mypackage$helper"
version: 1.0
id: mypackage$helper-1.0-IPID
key: YYY
# e.g. helper-AAA {}
version-hash: AAA
depends: containers-0.5-IPID
\end{verbatim}
%
Things to note:
\begin{enumerate}
\item The package in the Backpack file with the same name as the Cabal
package has special status: this is the package which is registered
to the installed package database under the same name. All other packages
are \emph{qualified} under the Cabal package name, e.g. \verb|mypackage$helper|.
\item The version hash, as described previously, is computed once for all
packages in the Backpack file, and the \verb|version| and \verb|version-hash|
are the same across all of them.
\item The key varies between the packages, since the $p$ parameter is different
in each one.
\item The installed package ID incorporates information about the package name.
\item Dependencies are only recorded directly \verb|include|d packages in a Backpack package. (GHC has to communicate to Cabal what the includes of every subpackage are.)
\end{enumerate}
%
A more complex example with instantiated packages looks similar:
\begin{verbatim}
package helper where
signature Data.Map
module P
package mypackage where
include containers (Data.Map)
include helper
module Q
\end{verbatim}
%
however, now the instantiation is recorded in the database as well.
\begin{verbatim}
name: "mypackage"
id: mypackage-1.0-IPID
version: 1.0
key: XXX
# e.g. mypackage-AAA {}
version-hash: AAA
# e.g. mypackage-1.0 { containers -> containers-0.5 }
depends: mypackage$helper-1.0-IPID, containers-0.5-IPID
---
name: "mypackage$helper"
version: 1.0
id: mypackage$helper-1.0-IPID
key: YYY
# e.g. helper-AAA { Data.Map -> containers-KEY:Data.Map }
version-hash: AAA
depends: (none)
instantiated-with:
Data.Map -> Data.Map@containers-0.5-IPID
\end{verbatim}
%
More remarks:
\begin{enumerate}
\item Cabal's recorded \verb|instantiated-with| records installed
package IDs, so that the used implementation is uniquely determined.
\item Conversely, \verb|depends| does NOT record non-textual dependencies
such as instantiated holes. \Red{is this necessary}
\item IPID includes information about how holes were instantiated.
\end{enumerate}
\paragraph{GHC to Cabal}
When GHC compiles a Backpack file, it is the only entity which knows
about the subpackages of a package. In order to make sure they are
all correctly installed, GHC has to communicate back some meta-data to
Cabal: for each package,
\begin{itemize}
\item The (computed) package keys
\item The dependencies
\item The instantiation
\end{itemize}
I guess we have to define some format to do this. GHC can't directly
write to the package database, because it doesn't know how to write in
the Cabal-specific portion of the information.
\Red{This is clunky, is there a way to eliminate this? It's not possible
for Cabal out of the box to handle this, since it assumes no module name conflicts
but there definitely may be some in Backpack.}
\paragraph{Indefinite package database}
The indefinite package database records indefinite packages (with holes)
that have been typechecked. An indefinite package is associated with a
(possibly unlimited) number of instantiated versions of the package,
which have been fully instantiated and compiled.
An indefinite package is a new type of entry in the existing installed package
database. \Red{or maybe another entry in a different database} Here are the important things to keep track of for an
indefinite package:
\begin{itemize}
\item Where do the (indefinite) interface files live? (NB: there are no
libraries since we haven't compiled the package.)
\item Where does the shape information live? (We could put it with the
interface files, it's a pretty similar binary file.)
\item Where does the source live, so we can recompile it when we instantiate it.
(If it's empty, we'll have to refetch it from Hackage or something).
\item Where does the Cabal configuration (result of running
\verb|cabal configure|) live, so that we build it with the same dependencies, flags, etc.
\end{itemize}
Associated with an indefinite package is some number of instantiated versions
of this package. These are identified by package key (the installed package ID
is the same) and are morally ``sub''-packages of the indefinite package,
although they get their own entries. \Red{Alternate plan: put them together.
Distinction between Cabal package and Backpack package.}
What makes installed indefinite packages difficult is that GHC may need to
recompile them on the fly depending on an include.
\paragraph{The plan}
\Red{To be worked out}
% Description: cabal-install only computes package-name edge labeling,
% then attempts to compile. If the package is indefinite, Cabal
% type checks and installs the interface files, source code and
% configuration information (TODO: this is something GHC has
% to understand\ldots) to the package database. If the package
% is definite, Cabal goes and ahead and builds it. During compilation,
% when processing an include GHC may notice that a package depends on an
% instantiation of an indefinite package that is not compiled; GHC
% goes ahead and builds it using the saved information.
% Con: We need to install indefinite packages, including all of
% the source and information we'd need to actually build it
% (the result of a configure? Only Cabal really knows how
% to understand that; so it should be like a Cabal configured
% package? If GHC calls in that's annoying.) It would be nice
% if this was done cabal-install style, but there are many downside
% to deferring all of this processing to cabal-install.
% Model: GHC compiles everything itself
% GHC needs to report multiple distinct compile products to Cabal
% GHC needs to ``reset'' the EPS (but only for type checking)
% Model: Cabal pre-compiles dependencies, and then GHC handles the rest
% Trouble: Cabal needs to be able to read the bkp file to find out what the instantiation is
% Fix: Have a GHC mode to output this information. Also, if Cabal is doing an old style it already knows.
% Trouble: seems wrong for normal Cabal to isntall it
% Think about it like a CACHE
Prior to GHC-7.10, Cabal passed versioning information to GHC using the
\verb|-package-name| flag. In GHC 7.10, this flag was renamed to
\verb|-this-package-key|. We propose that this flag be renamed once
again to \verb|-this-version-hash|, to which Cabal passes a hash (or string)
describing the versioning of the package which is then incorporated
into the package key. Cabal no longer needs to calculate package keys.
In the absence of Backpack, there will be no semantic difference if we
switch to full version hashes.
\end{document} % chktex 16
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment