GHC and Cabal disagree on the meaning of the term "package"

added Tquestion external-Cabal package system labels and removed needs triage label

@torsten.schmits Is there a particular problem which has motivated this issue?

In the example it would be better to use -package-id rather than -package as then this imprecision doesn't exist.

Is this a meta-issue to track the problems in #24689 and #24667 ?

Yes, it's intended as a meta issue. My example is supposed to provide additional motivation from a UX perspective, but since all of these issues ultimately hinge on a decision about the distinction between packages and components, I wanted to collate them under the umbrella of making that decision

changed the description

We discussed something similar with @doyougnu and @andreabedini last week. All this complexity about package selection is mostly unused because cabal-install explicitly passes -package-id for every package. So it's only used for:

PackageImports
GHCi
using ghc directly on the command line

In my opinion, PackageImports should be deprecated and removed. We can already use module renaming on the command line so if we could use it more easily from .cabal file that would offer a nice migration path. Perhaps something like the following syntax:

other-modules:
  base:Data.Ord as BaseOrd
  foo:Data.Ord as FooOrd

Fixing GHCi and direct use of GHC's CLI are more long term projects. Currently GHCi is merged into GHC and is "below" cabal. It would make more sense for it to be above cabal (like HLS is) to fully support multiple home units and package level commands (:set -package foo, etc.). Same thing for GHC's CLI: if it was a different program than the compiler per-se, we could make it interact with cabal instead of the other way around.

So a good first step would be to split GHC in two parts: frontends (GHC's CLI, GHCi, HLS...) and the rest of the compiler (as a library). We could remove a lot of accidental complexity in the process: stateful packagedb concept, package env files, weird split of responsibilities between cabal and ghc (recompilation avoidance, parallelism, etc.)... We were trying to make this possible in #17957 without having to fork GHC but there is still quite a lot of refactoring to do to make this possible.

Thanks, I added module renaming to the list of solutions.

So a good first step would be to split GHC in two parts ...

Definitely a worthwhile goal, though I think it would be reasonable for this issue to get interim solutions or workarounds!

changed the description

@torsten.schmits Yes, they do disagree. GHC uses "packages" (-package, packagedb) for what they are now called "units" (as in home-unit, -this-package-unit-id, etc). Package and components are Cabal concepts (related to packaging) while GHC only ever has to do with units, which are the things you load into the compiler (therefore in packagedb you only find units/libs, never exes). They used to be in (roughly) 1-to-1 correspondence, but that boat has sailed when Cabal introduced multi libraries. Duncan's talk at last year's Haskell Contributor Workshop gives some historical background.

Vaguely related is this meta-issue on the cabal repo: https://github.com/haskell/cabal/issues/8967.

For a pragmatic approach, on the grounds that less is better than more, I would prefer the second solution: "deprecate package imports in favor of module renaming". Usually, a package is a concept relative to code distribution, which is orthogonal to the compilation.

My 2c.

mentioned in issue #25035 (closed)

As @mpickering has cross-linked, the underlying problem has also manifested in #25035 (closed). In short:

(a) the GHC User Guide and ghc-pkg both assume that the 'name' of an installed package is that in its name field, which is unique (including as between the main library and named sub-libraries from the same Cabal package, due to munging of the name of the latter); but

(b) ~~ghc appears to be treating the package-name field (if it exists) of an installed package as overriding the name field for the purpose of GHC's -package option.~~ (EDIT 3 Dec 2024: my current understanding is now different - see #25035 (closed); if the name of an installed package follows Cabal's convention for munging the names of sub-libraries, ghc (at least, GHC 9.4.7) appears to be behaving as if the name of an installed package was actually the name of the Cabal package. The package-name field seems to have no role in this.) This breaks existing build tools, such as Stack, which rely on the behaviour documented in the GHC User Guide and on the output of ghc-pkg list.

I would argue that GHC's -package option should be consistent with ghc-pkg list and should only refer to the given name field of an installed package (not its package-name field, if the installed package has one, or a different intepretation of the name field).

That is because my intuition is informed by the GHC User Guide: https://downloads.haskell.org/ghc/latest/docs/users_guide/packages.html#using-packages

It starts by explaining installed packages and that ghc-pkg list lists all such installed packages. It then explains that an installed package can be exposed or hidden and that -package <pkg> is used to expose the specified installed package.

I would say that it is counter-intuitive that GHC's -package <pkg> could cause a different installed package to be exposed - one not listed by ghc-pkg list <pkg>.

It seems simple enough to make -package behave correctly for the most obvious use case of depending on the main library (I linked a hack for one variant in the OP).

Deprecating package imports in favor of Cabal renaming also sounds reasonable, if that feature has no other uses than disambiguating Cabal dependencies.

However, both of these only tape over symptoms, while the underlying incoherence remains. Cabal sublibraries will not be treated consistently until something along the lines of the first two solutions in my description is implemented, as far as I can tell!

It would be helpful to have some more insights from Cabal experts into the feasibility of those solutions – in particular, how much breakage it would cause to change Cabal's package DB codec for the package-name field to include the library name, as in package-name: mypackage:somelib. Maybe @andreabedini or @phadej can help here?

(EDIT 3 Dec 2024: my current understanding for what ghc (at least, GHC 9.4.7) actually does is now different - see #25035 (closed); if the name of an installed package follows Cabal's convention for munging the names of sub-libraries, ghc appears to be behaving as if the name of an installed package was actually the name of the Cabal package. The package-name field seems to have no role in this.)

If the GHC bug were fixed, such that GHC ~~ignored the content of the package-name field (which refers to a Cabal package, not an installed package)~~ (EDIT 3 Dec 2024) respected the given content of the name field, and -package <pkg> (expose an installed package) was once again consistent with the output of ghc-pkg list <pkg> (list installed package(s)), would GHC care what Cabal put in that field? Or am I missing something?

I assume that would make it consistent with ghc-pkg, though I have not investigated how package-name is used otherwise in GHC. In any case, sublibraries would then be referred to as z-mypackage-z-somelib, so there's still some UX improvements to be made, and since it's also Cabal who writes the name field, the required change would be very similar.

Definitely worth finding out what the extent of package-name's influence in GHC is though.

changed the description

It would be helpful to have some more insights from Cabal experts into the feasibility of those solutions

TL;DR, they are not feasible.

You could tie Cabal and GHC more closely together, but IMO that would mean, that at the very least, the Cabal (the library) part should live in GHC source tree then (c.f. haddock).

How do you consider the proposed approaches as "tying together" in a negative sense? It appears to me that the package DB config is a shared protocol – coordinating the field interpretation seems inherently necessary

I do think that having InstalledPackageInfo defined in Cabal-syntax; and ghc-pkg depending on Cabal-syntax basically only to use InstalledPackageInfo type is a suboptimal design.

The shared code between Cabal and GHC is not full Cabal-syntax. AFAICT, it's only parsing framework and InstalledPackageInfo grammar.

GHC "kind of "owns" the InstalledPackageInfo structure (as it's used for package databases). If external tools need some support, e.g. extra fields, because they want to build multi-libraries support, they should ask(ed) GHC devs to add it (and discuss other implementation concerns).

That dialogue hadn't been done (probably because code lives in other repository, so the ownership is not clear), so we are where we are.

AFAICT, the "bugs" happen from different interpretations of InstalledPackageInfo, which I would address as a consequence of uncclear ownership (no one have any idea what the fields actually mean). At the moment, both Cabal and GHC seems to treat that type as an own one, and neither is the source of truth on how to interpret package registration information.

EDIT: I'm not a GHC nor Cabal developer, so I have no say in what will happen next. I only tell what I know or think happened.

Oh, I assumed you were a Cabal dev for some reason, apologies.

That dialogue hadn't been done (probably because code lives in other repository, so the ownership is not clear), so we are where we are.

Right, that's what I'm trying to make up for here

changed the description

By way of update:

the Stack project has experienced a recent issue which I think is a manifestation of this 'in the wild' (in that case, attoparsec-related): https://github.com/commercialhaskell/stack/issues/6661; and
the Hackage project now better supports sub-libraries, which may (a) encourage the use of sub-libraries and (b) cause instances of this manifesting to increase.

(By 'this' I mean GHC's -package <pkg> causing a installed package to be exposed that is not one listed by ghc-pkg list <pkg>.)

(EDIT 3 Dec 2024: my current understanding for what ghc (at least, GHC 9.4.7) actually does is now different - see #25035 (closed); if the name of an installed package follows Cabal's convention for munging the names of sub-libraries, ghc appears to be behaving as if the name of an installed package was actually the name of the Cabal package. The package-name field seems to have no role in this. I have corrected some earlier comments above.)

By way of further update, this issue has manifested again at the Stack project (https://github.com/commercialhaskell/stack/issues/6704) because the popular vector package has introduced a sub-library from vector-0.13.2.0.

GHC and Cabal disagree on the meaning of the term "package"

Overview

Example bug

Potential Solutions

Solution in Cabal: Store `package-name: attoparsec:internal` in the package DB.

Solution in GHC: Read `lib-name` from the package DB and treat it as an additional axis.

Partial solution: Deprecate package imports in favor of module renaming

Partial solution: Select the main library for `-package` based on the absence of a component name

Partial solution: Use `name` instead of `package-name` in GHC

Child items ...

Activity

GHC and Cabal disagree on the meaning of the term "package"

Overview

Example bug

Potential Solutions

Solution in Cabal: Store package-name: attoparsec:internal in the package DB.

Solution in GHC: Read lib-name from the package DB and treat it as an additional axis.

Partial solution: Deprecate package imports in favor of module renaming

Partial solution: Select the main library for -package based on the absence of a component name

Partial solution: Use name instead of package-name in GHC

Activity

Solution in Cabal: Store `package-name: attoparsec:internal` in the package DB.

Solution in GHC: Read `lib-name` from the package DB and treat it as an additional axis.

Partial solution: Select the main library for `-package` based on the absence of a component name

Partial solution: Use `name` instead of `package-name` in GHC