Skip to content

GHC and Cabal disagree on the meaning of the term "package"

Overview

GHC uses the field package-name in the package DB for:

  • looking up the name passed in the CLI arg -package attoparsec
  • looking up package imports (import "attoparsec" Data.Attoparsec)
  • printing package names in diagnostics, like unused or hidden packages, but not consistently

Cabal stores the Cabal package name in that field, which may consist of multiple units (components). Named sublibraries get an additional field, lib-name: attoparsec-internal, which GHC mostly ignores.

This causes seemingly random libraries to be selected for a package name (which one depends on the lexical order of hashes in the unit ID).

Example bug

The flag -package picks the first match from the set of package DB entries with matching unitPackageIdString or unitPackageNameString. This results in unexpected behavior:

$ nix shell --impure \
  -I nixpkgs=https://github.com/nixos/nixpkgs/archive/7790e078f8979a9fcd543f9a47427eeaba38f268.tar.gz \
  --expr "with import <nixpkgs> {}; pkgs.haskellPackages.ghcWithPackages (g: [g.attoparsec])" \
  --command sh -c "ghci -package attoparsec <<< 'import Data.Attoparsec'"
ghci>
<no location info>: error:
    Could not load module ‘Data.Attoparsec’
    It is a member of the hidden package ‘attoparsec-0.14.4’.
    You can run ‘:set -package attoparsec’ to expose it.

The reason for this is that attoparsec includes an internal Cabal library whose package-name field is attoparsec. applyPackageFlag searches for the argument of -package in the mentioned fields and picks the first result:

case findPackages prec_map pkg_map closure arg pkgs unusable of
  Right (p:_) -> Succeeded vm'

In the above example, the unit ID of attoparsec-internal is lexicographically smaller than that of its main library, so the internal package is exposed instead.

Cabal avoid this problem by specifying -hide-all-packages, which disables "shadowing" by package name and therefore allows all entries with that name to be exposed, and by specifying packages with -package-id.


The source of this confusion appears to be the absence of a shared plan for the evolution of the concept of the organizational unit of a package. While Cabal classifies sublibraries as constituents of a package, GHC conflates the two concepts in some respects. Notably, the type used in the above example, GenericUnitInfo, has a field called unitComponentName whose haddock mentions Cabal, as does that of unitPackageId.

Another consequence of this incoherence is that GHC's user messages lack in precision and sometimes display incorrect information, for reasons similar to what I described above.

In this conversation, @bgamari suggests to avoid the distinction in GHC and change the package name that Cabal writes to DBs to include the library suffix, as in package-name: attoparsec:attoparsec-internal, while @phadej prefers to teach GHC to distinguish based on the lib-name field and make it first-class information in diagnostics. Judging by the logic in applyPackageFlag, that change should be straight-forward, but I don't know yet how much other code depends on the equivalence of units and packages in that sense.

In another issue, @mpickering reports that package imports exhibit the same randomness as in my example. Here @sheaf argues for adopting Cabal's component syntax for imports, while Ben suggests a similar approach as in the other thread.

Similarly, this issue illustrates how sublibraries are mentioned in some, but not all parts of a diagnostic.


In summary, Cabal and GHC devs need to find consensus about this topic.

I'd assume that most users think about packages and libraries from what they are presented with by Cabal, so I think it would be good for UX to fully adopt Cabal's vocabulary at least for diagnostics.

Regarding my specific issue, my intuition as a user is that -package causes all public libraries in the package to be exposed, but that's a very weak opinion.


Potential Solutions

I'll get the discussion started by listing some observations about two possible approaches to realizing this:

Solution in Cabal: Store package-name: attoparsec:internal in the package DB.

  • No new concepts have to be added to GHC
  • Changing the written value is trivial, but if other parts of the code depend on the field to contain the Cabal package, it might be more involved
  • -package attoparsec and -package attoparsec:internal should work as expected without changes
  • import "attoparsec:internal" Data.Attoparsec.Blah might work without changes, but should be trivial to fix if not
  • Diagnostics might improve without changes, but need to be checked case-by-case
  • GHC will still speak about packages in diagnostics, which may be counterintuitive and leaks implementation details of the package DB

Solution in GHC: Read lib-name from the package DB and treat it as an additional axis.

  • Needs no changes in Cabal
  • Likely needs comprehensive changes in GHC
  • Diagnostics can precisely refer to sublibraries when needed
  • -package can be changed to expose all libraries of a package
  • Suggests a new flag -package-lib to load a single library
  • Package imports will work reliably for main libraries, but not at all for internal libraries (without looking up unit IDs) without new logic
  • GHC already has awareness of Cabal package structure in unit-related data types

Partial solution: Deprecate package imports in favor of module renaming

  • Already possible today
  • Would benefit from more ergonomic syntax in Cabal config

Partial solution: Select the main library for -package based on the absence of a component name

  • Quick workaround, trivially implemented (I already used this approach for our experiments with GHC at work)
  • Doesn't address underlying issue
  • "Restores" expected behavior exhibited in absence of sublibraries

Partial solution: Use name instead of package-name in GHC

  • Unclear if GHC requires the distinction between those fields
  • Sublibraries would be unambiguously addressable
  • Sublibrary names would be mangled z- strings
  • Easy fix if package-name has no other purpose
Edited by Torsten Schmits
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information