Edward Z. Yang · c37a7d3f
--- a/backpack.md
+++ b/backpack.md
@@ -7,12 +7,30 @@

 Large-scale modularity refers to the modularization of software into libraries, which are built upon other libraries.  A package manager offers a limited degree of flexibility by permitting a library to built against varying \*versions\* of its dependencies.  Backpack seeks to solve the following problems related to programming with libraries:

+1. **I want to write a library that works with ByteString, Text and String, but I only want to write it once.**  Today, I may have to maintain multiple versions of the package: `foo-bytestring`, `foo-text`, `foo-string`, each specialized against a specific string representation.  (TODO is this a good example, or do people want to write their library differently for these different types?) Similar situations occur when a library want to support multiple "backends". This problem is exacerbated when someone else writes another library which builds on top of `foo`; now they have to write three versions of the package. It is better if the library can be written once, and parametrized on a signature describing strings, allowing users to fill in their own string implementation.
+
+  - Here are examples of libraries with "multiple drivers":
+
+    - [ satchmo](http://hackage.haskell.org/package/satchmo) (satchmo-backends, satchmo-funsat, satchmo-toysat)
+    - [ Chart](http://hackage.haskell.org/package/Chart) (Chart-cairo, Chart-diagrams , Chart-gtk).
+    - [ FTGL](http://hackage.haskell.org/package/FTGL) (FTGL-bytestring)
+    - [ HDBC](http://hackage.haskell.org/package/HDBC) (HDBC-mysql, HDBC-odbc, HDBC-postgresql, HDBC-session, HDBC-sqlite3)
+    - [ MuCheck](http://hackage.haskell.org/package/MuCheck) (MuCheck-HUnit, MuCheck-Hspec, MuCheck-QuickCheck, MuCheck-SmallCheck)
+    - [ Shellac](http://hackage.haskell.org/package/Shellac) (Shellac-compatline, Shellac-editline, Shellac-haskeline, Shellac-readline)
+    - (I stopped after looking at all of the capitalized package names lol)
+  - A common comment is, "Don't type classes work for this case?"  Common problems with using type classes:
+
+    - Ambiguity: type class resolution must be done with respect to a type parameter, even when there is no natural one.  In some cases, a proxy type must be used when there is no natural type. This is exacerbated with multiparameter type classes, when some methods may not have enough types in their signature to uniquely identify an instance.
+    - Newtype: type class resolution is done purely based on types; no way to parametrize over implementation. You must newtype in this situation.
+    - Multiple parameters: without associated types, a multiparameter type class must be used when an interface requires multiple types. These types must be threaded through all functions which use this type signature; an annoying amount of plumbing.
+    - Type classes infect call sites: if you have a data type from an type class associated type, and want to refer to it from another data type `T`, any function using T must also remember to add the type class constraint to their site.
+    - Lack of specialization: type classes mostly must always be done in dictionary passing style (with inlining, sometimes the dictionary can be inlined, but don't count on it)
+  - Why only at the package level? Bringing it down to the module level (and even finer) is the subject of small-scale modularity.
+
 1. Is my library compatible against a given version of a dependency?  To determine this today, you must first install the library, and then build your code against it. With Backpack, you can write down precisely what interface you depend against, at which the compatibility check only involves testing if an implementation correctly implements the interface. Better yet, a library with explicit Backpack dependencies can be installed without installing any of its prerequisites. This information can be collected together in order to give accurate version dependencies. (TODO Interesting problem: Backpack says nothing about what should happen when someone generalizes a type signatures.  Conditional compilation suggests that there may need to be multiple interface sets that a package can compile against; variational programming but only with interfaces.) (TODO Right now, versions and instantiation are completely orthogonal, which sucks.)

 1. Does anyone depend on this API?  If you want to make a backwards incompatible change to a library, it can be difficult to tell who will be affected. Explicit interfaces are *transmissible*; clients should be able to submit the slices of the interfaces they depend on to upstream, giving maintainers a view into what APIs are used.  This capability would be especially beneficial for packages with a large and organically grown API (e.g. the ghc package).  (TODO In what sense is an interface transmissible? Interface needs to be able to refer to other types which need to live somewhere. These are "subsidiary" in some sense; when checking for compatibility you don't care about these types.  Need to analyze this situation more carefully. See also [\#10798](https://gitlab.haskell.org//ghc/ghc/issues/10798).)

-1. How can I make a package parametric on a dependency?  In the Haskell ecosystem, if I write a package `foo` which uses strings, I may publish to Hackage many versions of the package: `foo-bytestring`, `foo-text`, `foo-string`, each specialized against a specific string representation.  (TODO in this example, do the implementations commonly differ?) A more preferable way to structure the package is to write it once, parametrized on an abstract string type, and allow users to fill in their own string implementation.
-
 ### Small-scale modularity


@@ -56,10 +74,12 @@ data Module = Module { moduleUnitId :: UnitId
                     , moduleName :: ModuleName }
 data UnitId = UnitId { unitIdComponentId :: ComponentId
                     , unitIdInsts :: [(ModuleName, Module)] }
+            | Hole

 -- The intermediate representation of a component
 data Component = Component {
-  -- The UnitId of the component in question.
+  -- The UnitId of the component in question.  Invariant: every
+  -- instantiation is H -> hole:H
  unitId :: UnitId,
  -- The direct dependencies of the component; i.e. how the includes
  -- are resolved.
@@ -70,11 +90,15 @@ data Component = Component {
  importedModules :: [(ModuleName, Module)],
  -- The exported modules of the component.  Local modules will
  -- have moduleUnitId == unitId, but there may also be reexports.
+  -- Invariant: no HOLEs are in this list.  Holes are recorded in unitId.
  exposedModules :: [(ModuleName, Module)]
 }
 ```


+Invariant: every hole in not unitId is bound by a hole in unitId.
+
+
 Intuitively, the algorithm for compiling a `UnitId` goes as follows:

 1. Lookup the `Component` corresponding to the `ComponentId` of the `UnitId`.
@@ -89,7 +113,7 @@ How do you compile a `Component`; that is to say, what flags are passed to GHC t

 1. The chosen STRING unit ID (which is to be used for linker symbols).  Done in old versions of GHC with `-package-name` (or more recently `-this-package-key` and `-this-unit-id`.
 1. The full unit ID data structure.  My suggestion is that this is given in two parts: `-this-component-id` (a new flag) and `-sig-of` (shipped already in GHC 7.10)
-1. The set of modules that are in scope, from `importModules`.  There are two ways to do this: a series of `-package-id "p (A as B)"` flags (which mimic a source-level include declaration, or manually specifying each of the modules in scope (some new flag).  Besides possibly being quite long, the latter is attractive because the elaboration to the Backpack IR needs to compute the set of modules in scope (so it knows how to instantiate things.)
+1. The set of modules that are in scope, from `importModules`.  TODO There are two ways to do this: a series of `-package-id "p (A as B)"` flags (which mimic a source-level include declaration, or manually specifying each of the modules in scope (some new flag).  Besides possibly being quite long, the latter is attractive because the elaboration to the Backpack IR needs to compute the set of modules in scope (so it knows how to instantiate things.) The former is attractive because it works without modification on old versions of GHC.
 1. The set of requirements that are in scope, from `instantiatedDepends`.  These indicate what other signatures need to be merged into the local `hsig`s.

 ### Cabal syntax