GHC and Cabal disagree on the meaning of the term "package"
Overview
GHC uses the field package-name
in the package DB for:
- looking up the name passed in the CLI arg
-package attoparsec
- looking up package imports (
import "attoparsec" Data.Attoparsec
) - printing package names in diagnostics, like unused or hidden packages, but not consistently
Cabal stores the Cabal package name in that field, which may consist of multiple units (components). Named sublibraries get an additional field, lib-name: attoparsec-internal
, which GHC mostly ignores.
This causes seemingly random libraries to be selected for a package name (which one depends on the lexical order of hashes in the unit ID).
Example bug
The flag -package
picks the first match from the set of package DB entries with matching unitPackageIdString
or unitPackageNameString
. This results in unexpected behavior:
$ nix shell --impure \
-I nixpkgs=https://github.com/nixos/nixpkgs/archive/7790e078f8979a9fcd543f9a47427eeaba38f268.tar.gz \
--expr "with import <nixpkgs> {}; pkgs.haskellPackages.ghcWithPackages (g: [g.attoparsec])" \
--command sh -c "ghci -package attoparsec <<< 'import Data.Attoparsec'"
ghci>
<no location info>: error:
Could not load module ‘Data.Attoparsec’
It is a member of the hidden package ‘attoparsec-0.14.4’.
You can run ‘:set -package attoparsec’ to expose it.
The reason for this is that attoparsec
includes an internal Cabal library whose package-name
field is attoparsec
. applyPackageFlag
searches for the argument of -package
in the mentioned fields and picks the first result:
case findPackages prec_map pkg_map closure arg pkgs unusable of
Right (p:_) -> Succeeded vm'
In the above example, the unit ID of attoparsec-internal
is lexicographically smaller than that of its main library, so the internal package is exposed instead.
Cabal avoid this problem by specifying -hide-all-packages
, which disables "shadowing" by package name and therefore allows all entries with that name to be exposed, and by specifying packages with -package-id
.
The source of this confusion appears to be the absence of a shared plan for the evolution of the concept of the organizational unit of a package. While Cabal classifies sublibraries as constituents of a package, GHC conflates the two concepts in some respects. Notably, the type used in the above example, GenericUnitInfo
, has a field called unitComponentName
whose haddock mentions Cabal, as does that of unitPackageId
.
Another consequence of this incoherence is that GHC's user messages lack in precision and sometimes display incorrect information, for reasons similar to what I described above.
In this conversation, @bgamari suggests to avoid the distinction in GHC and change the package name that Cabal writes to DBs to include the library suffix, as in package-name: attoparsec:attoparsec-internal
, while @phadej prefers to teach GHC to distinguish based on the lib-name
field and make it first-class information in diagnostics. Judging by the logic in applyPackageFlag
, that change should be straight-forward, but I don't know yet how much other code depends on the equivalence of units and packages in that sense.
In another issue, @mpickering reports that package imports exhibit the same randomness as in my example. Here @sheaf argues for adopting Cabal's component syntax for imports, while Ben suggests a similar approach as in the other thread.
Similarly, this issue illustrates how sublibraries are mentioned in some, but not all parts of a diagnostic.
In summary, Cabal and GHC devs need to find consensus about this topic.
I'd assume that most users think about packages and libraries from what they are presented with by Cabal, so I think it would be good for UX to fully adopt Cabal's vocabulary at least for diagnostics.
Regarding my specific issue, my intuition as a user is that -package
causes all public libraries in the package to be exposed, but that's a very weak opinion.
Potential Solutions
I'll get the discussion started by listing some observations about two possible approaches to realizing this:
package-name: attoparsec:internal
in the package DB.
Solution in Cabal: Store - No new concepts have to be added to GHC
- Changing the written value is trivial, but if other parts of the code depend on the field to contain the Cabal package, it might be more involved
-
-package attoparsec
and-package attoparsec:internal
should work as expected without changes -
import "attoparsec:internal" Data.Attoparsec.Blah
might work without changes, but should be trivial to fix if not - Diagnostics might improve without changes, but need to be checked case-by-case
- GHC will still speak about packages in diagnostics, which may be counterintuitive and leaks implementation details of the package DB
lib-name
from the package DB and treat it as an additional axis.
Solution in GHC: Read - Needs no changes in Cabal
- Likely needs comprehensive changes in GHC
- Diagnostics can precisely refer to sublibraries when needed
-
-package
can be changed to expose all libraries of a package - Suggests a new flag
-package-lib
to load a single library - Package imports will work reliably for main libraries, but not at all for internal libraries (without looking up unit IDs) without new logic
- GHC already has awareness of Cabal package structure in unit-related data types
Partial solution: Deprecate package imports in favor of module renaming
- Already possible today
- Would benefit from more ergonomic syntax in Cabal config
-package
based on the absence of a component name
Partial solution: Select the main library for - Quick workaround, trivially implemented (I already used this approach for our experiments with GHC at work)
- Doesn't address underlying issue
- "Restores" expected behavior exhibited in absence of sublibraries
name
instead of package-name
in GHC
Partial solution: Use - Unclear if GHC requires the distinction between those fields
- Sublibraries would be unambiguously addressable
- Sublibrary names would be mangled
z-
strings - Easy fix if
package-name
has no other purpose