Running ./configure from the bindist with an LDFLAGS containing -Wl,-z,pack-relative-relocs fails when trying to determine machine word size. I run into this issue when installing a GHC through stack within a PKGBUILD on Arch Linux -- the default LDFLAGS setting there contains -Wl,-z,pack-relative-relocs since last month.
Raised the issue both in stack and Arch Linux repos before being prodded by @maerwald that this might need to be addressed via GHC as well
GHC isn't the only Haskell program affected, I similarly have trouble building unix and network. I haven't investigated these in-depth, but suspect the issue is a similar use of autoconfig for the ffi bits which fails. Raised issue in cabal to see if they can't similarly select $LD based on $LDFLAGS.
root@cc2ac802d6e9:/ghc-9.10.0.20240313-x86_64-unknown-linux# ld.gold -z pack-relative-relocsld.gold: pack-relative-relocs: unknown -z optionld.gold: use the --help option for usage informationroot@cc2ac802d6e9:/ghc-9.10.0.20240313-x86_64-unknown-linux# ld.bfd -z pack-relative-relocsld.bfd: warning: -z pack-relative-relocs ignoredld.bfd: no input files
GHC will indeed try to use ld.gold before the BFD ld
My assessment is that if that flag is configured into makepkg.conf, perhaps the linker which supports it should also be configured explicitly (LD=ld.bfd or LD=ld).
Or, perhaps more sensibly, you can ./configure just GHC explicitly with LD=ld.bfd and LDFLAGS=-Wl,-z,pack-relative-relocs (this makes configure succeed in both 9.6.4 and 9.10.1 bindists).
All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some trulyhorrible ones.
Such a blanket approach to linker selection would be actively harmful to link reliability. Linkers are by their nature highly platform-dependent. As such different linkers --- including ld.bfd --- exhibit different bugs on different platforms, especially when confronted with code that differs from what, e.g., gcc would produce. There is no "one-size fits all" choice here; rather, linker selection is a matter of finding the "least bad" option for the user's environment.
FWIW, we prefer lld specifically because it is (on Linux) overall a more reliable linker than ld.gold (or, for that matter, ld.bfd). It is sad that lld isn't as widely available as gold, which ships with binutils. If we are concerned about maximizing the probability of reliable linkage on every point in the configuration space then we would likely rather insist that the user install lld.
Given that:
linkage comprises a significant fraction of overall build times, and
GHC already sees a considerable amount of criticism for its compilation times
I think that the link-time advantages to avoiding ld.bfd are IMHO quite compelling. Instead of throwing the baby out with the bathwater, we should rather improve our configure checks to further reduce the probability that the user ends up with a broken installation.
To this end, it seems that the issue here is that LDFLAGS is not respected by m4/find_ld.m4 but is used in later configure checks. This is a bug that should be fixed.
Agreed. I've added that configure check in !12282 (closed) which makes configure fall back to ld.bfd because compilation with ld.gold + -z pack-relative-relocs will fail.
All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some trulyhorrible ones.
This doesn't seem like an accurate representation of the state of linkers.
In my opinion, the best way to understand which is the most reliable linker is to look at Linux source distros and their defaults and opinions, because they have the largest exposure. One such distro is Gentoo and they very clearly advice against using gold: https://wiki.gentoo.org/wiki/Gold
My (although dated) experience of being a Gentoo developer confirms this intuition (although I don't have hard numbers): bfd is more reliable. I could probably search the gentoo issue tracker and compare bugs filed by users who use gold vs bugs filed by users who use bfd, but I don't think that's a good use of my time.
From the gentoo wiki:
gold is less active than it once was, e.g. gold's commit history vs bfd's commit history. Users seeking an alternative linker may be interested in LLVM's lld.
Maintenance status matters. Adoption matters. Distro defaults matter. Most distros default to ld.bfd. Distro maintainers usually know best about their toolchain. GHC should work with all major linkers and stop using defaults other than the distro default.
Such a blanket approach to linker selection would be actively harmful to link reliability.
I think what's harmful is that GHC still thinks it knows better than distro maintainers, who select the most reliable configuration for their users. And it also thinks it knows better than their users, because it tries really hard to force gold on them, even if the system default is bfd.
we should rather improve our configure checks to further reduce the probability that the user ends up with a broken installation.
Depends what kind of checks you're talking about. Install configure scripts are only for basic toolchain sanity checks. Everything else (e.g. the linker doesn't trigger known bugs etc.) is supposed to be part of the test suite and the user is supposed to run the test suite on their system after installation.
But the GHC bindist testsuite remains fairly broken: #24555
Depends what kind of checks you're talking about. Install configure scripts are only for basic toolchain sanity checks. Everything else (e.g. the linker doesn't trigger known bugs etc.
We have a difference of opinion here: whether a linker exhibits known bugs in my opinion sits squarely in the realm of a "basic toolchain sanity check". Further, I don't believe that it is reasonable to expect a user to perform a 10-minute testsuite run post-installation and have little confidence that end-users would actually do so. GHC has historically performed a wide variety of toolchain checks in its install-time configure script and this has saved us untold hours of debugging toolchain issues.
To be clear, it is ghcup's prerogative to disable configure's ld-override behavior. I personally feel that the two bugs in ld.gold observed to affect GHC does not justify a significant regression in link times for all users: by my measurement, linking a simple Setup.hs against the Cabal library with ld.gold cuts linking time by a third. However, this is a distribution matter and I can only state my opinion.
At the same time, I do want to be clear that GHC's install-time configure checks are an important feature of our installation procedure. I would not condone any effort to remove or otherwise cripple them by downstream distributors. As noted above, they catch real issues which save both us and our users time and headache.
whether a linker exhibits known bugs in my opinion sits squarely in the realm of a "basic toolchain sanity check".
Fair enough. The armv7 bug also seems to have a clear and small reproducer.
Further, I don't believe that it is reasonable to expect a user to perform a 10-minute testsuite run post-installation and have little confidence that end-users would actually do so.
Well, I think we fundamentally disagree here. I'm pretty sure industry users would, for sure run the test suite e.g. in their CI, so they know whether the toolchain they're using to build a program that's going to be deployed on a safety critical machine is actually behaving well.
Whether the GHC testsuite succeeded in the environment of your release CI says very little about whether it would on the end users system.
That means right now... most users have no idea whether GHC behaves well on their system.
GHCup already makes it easy to execute the test suite. Except it's always broken.
To be clear, it is ghcup's prerogative to disable configure's ld-override behavior. I personally feel that the two bugs in ld.gold observed to affect GHC does not justify a significant regression in link times for all users: by my measurement, linking a simple Setup.hs against the Cabal library with ld.gold cuts linking time by a third. However, this is a distribution matter and I can only state my opinion.
Well, GHC HQ and GHCup have always had very different priorities. I care about reliability and distro defaults more than about performance. That is my right to have this priority for GHCup as a distribution channel that selects certain defaults for the end user (just like the recommended version is very different from what GHC HQ would do).
Now the question is just, if you support ld.bfdat all. If you do, I will go ahead with disabling ld override. If not, I would expect GHC to force ld.gold. Because I don't want to get ghcup users shut down when they file bugs that happen with ld.bfd. That's all I want to know.
The change in ld override will be communicated, documented and end users will be able to opt out.
Well, GHC HQ and GHCup have always had very different priorities. I care about reliability and distro defaults more than about performance.
We place a similarly high value on reliability; afterall linker bugs tend to be extremely subtle and difficult to diagnose.
However, despite this, I have not seen any issues in ld.gold (beyond those that configure already checks for) which would outweigh the link-time benefits that it provides.
Now the question is just, if you support ld.bfdat all. If you do, I will go ahead with disabling ld override. If not, I would expect GHC to force ld.gold. Because I don't want to get ghcup users shut down when they file bugs that happen with ld.bfd. That's all I want to know.
Yes, we support use of ld.bfd.
And also: does your validate pipeline execute the tests with bfd and lld?
No, we only test the linker chosen by the configure script in CI.
All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some trulyhorrible ones.
The difference between gold and other linkers is that gold has been effectively abandoned upstream.
I have been struggling to debug GHC on 32-bit PowerPC for several weeks now only to realize the problem was that GHC enforces gold by default despite being configured with --disable-ld-override.
I have been struggling to debug GHC on 32-bit PowerPC for several weeks now only to realize the problem was that GHC enforces gold by default despite being configured with --disable-ld-override.
Are you saying that you passed --disable-ld-override and the compiler still used -fuse-ld=gold? That doesn't seem possible to me from looking at the code and a different bug to the one in this ticket.
Passing --disable-ld-override will use bfd during the GHC build itself. However the settings file in /usr/lib/ghc/lib/settings still ends up with "C compiler link flags", "-fuse-ld=gold" which is what breaks GHC for me when actually trying to use it.
@trac-glaubitz how did you install GHC? Did you create a bindist after compilation and installed that? The --disable-ld-override also has to be passed to the bindist configure script.
My experience tells me bfd is a reliable linker in almost all cases, even if a bit slow.
In my experience most of these linkers are reliable in most cases. However, each linker also has cases where it will fail. Consequently, we have to make a cost-benefit analysis: what is the likelihood that a link will fail compared to the benefit of considerably faster link times. Such an analysis is what lead to the status quo.
It is a result of tables-next-to-code, not the mangler. While at one point the mangler was used to implement TNTC, this hasn't been the case for several years.
However, this sort of consideration is precisely why it is hard to extrapolate a linker's reliability from its performance on code from C-like compilers: GHC produces qualitatively different code to what such a compiler would produce and will consequently encounter different bugs.
However, this sort of consideration is precisely why it is hard to extrapolate a linker's reliability from its performance on code from C-like compilers: GHC produces qualitatively different code to what such a compiler would produce and will consequently encounter different bugs.
Fair enough.
However you've repeatedly indicated that the choice of ld.gold is more about performance and not about reliability.
The ld.bfd bug seems to only be triggered on armv7, which isn't even properly supported by GHC anymore (no bindists).
So what I'd be interested in: what is your scoring of linker reliability on the most common platforms regarding GHCs use case (ignoring performance matters).
In general I would agree that ld.bfd on most platforms will exhibit fewer issues than ld.goldon most platforms. However, this is going to be platform dependent and ld.lld is better than either.
It is of course at a user's complete discretion about how you configure the bindist but it seems quite aggressive to universally change the default linker for all GHC versions on all configurations from something which has worked for nearly all users for a long time.
@maerwald Thanks for the links, can you explain what conclusions you can draw from them?
Yes, there are situations where you want to configure GHC to do something different to the default and you can do that by passing --disable-ld-override. You have pointed out a handful of cases where things go wrong when thousands of people install GHC with the default options and things work out fine.
Perhaps your point is instead that users find it surprising that GHC attempts to use gold rather than whatever linker their C compiler is normally using on their system. That I can appreciate could be confusing if you are used to how other software is packaged. However, users who are not knowledgeable linkers are interested in user experience and that generally gold at some stage in the past was more robust and a better choice than bfd if it was available. It's my impression that most haskell programmers don't know anything about linkers nor understand the difference between them.
Are you advocating for making a high-risk change about linker defaults which affects all users because a few users have issues with the default settings? I'm not comfortable estimating the impact of that decision. If you think there is consensus to change the default then feel free to drive that decision forward. By making --disable-ld-override the default in ghcup seems like a way to test out this hypothesis for now and if there is minimal impact then we can also make the change in the default GHC settings.
I think I am just approaching this from a more cautious position than you, modifying linking configuration has proven in the past to be a subtle point of packaging where we have had to be careful.
So it's certainly possible that the situation has changed since 2015 and I imagine that bfd now supports aarch64 due to it's ubiquity now compared to 2015.
No objection here. However, gold seems to be in rather poor shape these days and it's better to use it only on a selected number of architectures. If alternatives are wanted, it might be an idea to look into mold.