bindist configure fails when LDFLAGS contains pack-relative-locs

added needs triage label

removed needs triage label

added Phigh Tbug configure script ghc-toolchain labels

changed milestone to %9.8.3

added backport needed:9.10 backport needed:9.8 labels

assigned to @bgamari

unassigned @bgamari

assigned to @alt-romes

I've reproduced this with the 9.6.4 and 9.10.1-alpha1 bindists.

The problem is that the ld.gold linker doesn't support -z pack-relative-relocs, you can see this by looking in config.log:

configure:10099: checking size of void *
configure:10104: gcc -o conftest  -g -O2 -fuse-ld=gold  -Wl,-z,pack-relative-relocs -Wl,--no-as-needed conftest.c  >&5
/usr/bin/ld.gold: pack-relative-relocs: unknown -z option
/usr/bin/ld.gold: use the --help option for usage information
collect2: error: ld returned 1 exit status
configure:10104: $? = 1
configure: program exited with status 1

Other AUR users have also reported failures related to ld.gold not supporting this option https://aur.archlinux.org/packages/qt5-webkit#comment-960032, and claim that plain ld does!

root@cc2ac802d6e9:/ghc-9.10.0.20240313-x86_64-unknown-linux# ld.gold -z pack-relative-relocs
ld.gold: pack-relative-relocs: unknown -z option
ld.gold: use the --help option for usage information
root@cc2ac802d6e9:/ghc-9.10.0.20240313-x86_64-unknown-linux# ld.bfd -z pack-relative-relocs
ld.bfd: warning: -z pack-relative-relocs ignored
ld.bfd: no input files

GHC will indeed try to use ld.gold before the BFD ld

My assessment is that if that flag is configured into makepkg.conf, perhaps the linker which supports it should also be configured explicitly (LD=ld.bfd or LD=ld).

Or, perhaps more sensibly, you can ./configure just GHC explicitly with LD=ld.bfd and LDFLAGS=-Wl,-z,pack-relative-relocs (this makes configure succeed in both 9.6.4 and 9.10.1 bindists).

mentioned in commit 6929c51f

mentioned in merge request !12282 (closed)

The GHC-side solution is to fall back to bfd if compiling with gold + LDFLAGS does not succeed. Fix in !12282 (closed)

changed the description

Sniped -- just confirmed this is the problem with network's build as well. Will pass on the suggestion to set LD explicitly.

changed the description

This isn't the first time ld.gold is causing problems and not the first time I have questioned that GHC defaults to it: #17508

GHCup already forces ld.bfd for alpine https://github.com/haskell/ghcup-hs/issues/967 ...now I'm wondering if I'll force it for all platforms.

GHCup already forces ld.bfd for alpine https://github.com/haskell/ghcup-hs/issues/967 ...now I'm wondering if I'll force it for all platforms.

All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some truly horrible ones.

Such a blanket approach to linker selection would be actively harmful to link reliability. Linkers are by their nature highly platform-dependent. As such different linkers --- including ld.bfd --- exhibit different bugs on different platforms, especially when confronted with code that differs from what, e.g., gcc would produce. There is no "one-size fits all" choice here; rather, linker selection is a matter of finding the "least bad" option for the user's environment.

FWIW, we prefer lld specifically because it is (on Linux) overall a more reliable linker than ld.gold (or, for that matter, ld.bfd). It is sad that lld isn't as widely available as gold, which ships with binutils. If we are concerned about maximizing the probability of reliable linkage on every point in the configuration space then we would likely rather insist that the user install lld.

Given that:

linkage comprises a significant fraction of overall build times, and
GHC already sees a considerable amount of criticism for its compilation times

I think that the link-time advantages to avoiding ld.bfd are IMHO quite compelling. Instead of throwing the baby out with the bathwater, we should rather improve our configure checks to further reduce the probability that the user ends up with a broken installation.

To this end, it seems that the issue here is that LDFLAGS is not respected by m4/find_ld.m4 but is used in later configure checks. This is a bug that should be fixed.

Agreed. I've added that configure check in !12282 (closed) which makes configure fall back to ld.bfd because compilation with ld.gold + -z pack-relative-relocs will fail.

We should be sure to backport this fix.

All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some truly horrible ones.

This doesn't seem like an accurate representation of the state of linkers.

In my opinion, the best way to understand which is the most reliable linker is to look at Linux source distros and their defaults and opinions, because they have the largest exposure. One such distro is Gentoo and they very clearly advice against using gold: https://wiki.gentoo.org/wiki/Gold

My (although dated) experience of being a Gentoo developer confirms this intuition (although I don't have hard numbers): bfd is more reliable. I could probably search the gentoo issue tracker and compare bugs filed by users who use gold vs bugs filed by users who use bfd, but I don't think that's a good use of my time.

From the gentoo wiki:

gold is less active than it once was, e.g. gold's commit history vs bfd's commit history. Users seeking an alternative linker may be interested in LLVM's lld.

You can verify this yourself:

Maintenance status matters. Adoption matters. Distro defaults matter. Most distros default to ld.bfd. Distro maintainers usually know best about their toolchain. GHC should work with all major linkers and stop using defaults other than the distro default.

Such a blanket approach to linker selection would be actively harmful to link reliability.

I think what's harmful is that GHC still thinks it knows better than distro maintainers, who select the most reliable configuration for their users. And it also thinks it knows better than their users, because it tries really hard to force gold on them, even if the system default is bfd.

we should rather improve our configure checks to further reduce the probability that the user ends up with a broken installation.

Depends what kind of checks you're talking about. Install configure scripts are only for basic toolchain sanity checks. Everything else (e.g. the linker doesn't trigger known bugs etc.) is supposed to be part of the test suite and the user is supposed to run the test suite on their system after installation.

But the GHC bindist testsuite remains fairly broken: #24555

https://github.com/haskell/ghcup-hs/issues/1032

Depends what kind of checks you're talking about. Install configure scripts are only for basic toolchain sanity checks. Everything else (e.g. the linker doesn't trigger known bugs etc.

We have a difference of opinion here: whether a linker exhibits known bugs in my opinion sits squarely in the realm of a "basic toolchain sanity check". Further, I don't believe that it is reasonable to expect a user to perform a 10-minute testsuite run post-installation and have little confidence that end-users would actually do so. GHC has historically performed a wide variety of toolchain checks in its install-time configure script and this has saved us untold hours of debugging toolchain issues.

To be clear, it is ghcup's prerogative to disable configure's ld-override behavior. I personally feel that the two bugs in ld.gold observed to affect GHC does not justify a significant regression in link times for all users: by my measurement, linking a simple Setup.hs against the Cabal library with ld.gold cuts linking time by a third. However, this is a distribution matter and I can only state my opinion.

At the same time, I do want to be clear that GHC's install-time configure checks are an important feature of our installation procedure. I would not condone any effort to remove or otherwise cripple them by downstream distributors. As noted above, they catch real issues which save both us and our users time and headache.

whether a linker exhibits known bugs in my opinion sits squarely in the realm of a "basic toolchain sanity check".

Fair enough. The armv7 bug also seems to have a clear and small reproducer.

Further, I don't believe that it is reasonable to expect a user to perform a 10-minute testsuite run post-installation and have little confidence that end-users would actually do so.

Well, I think we fundamentally disagree here. I'm pretty sure industry users would, for sure run the test suite e.g. in their CI, so they know whether the toolchain they're using to build a program that's going to be deployed on a safety critical machine is actually behaving well.

Whether the GHC testsuite succeeded in the environment of your release CI says very little about whether it would on the end users system.

That means right now... most users have no idea whether GHC behaves well on their system.

GHCup already makes it easy to execute the test suite. Except it's always broken.

To be clear, it is ghcup's prerogative to disable configure's ld-override behavior. I personally feel that the two bugs in ld.gold observed to affect GHC does not justify a significant regression in link times for all users: by my measurement, linking a simple Setup.hs against the Cabal library with ld.gold cuts linking time by a third. However, this is a distribution matter and I can only state my opinion.

Well, GHC HQ and GHCup have always had very different priorities. I care about reliability and distro defaults more than about performance. That is my right to have this priority for GHCup as a distribution channel that selects certain defaults for the end user (just like the recommended version is very different from what GHC HQ would do).

Now the question is just, if you support ld.bfd at all. If you do, I will go ahead with disabling ld override. If not, I would expect GHC to force ld.gold. Because I don't want to get ghcup users shut down when they file bugs that happen with ld.bfd. That's all I want to know.

The change in ld override will be communicated, documented and end users will be able to opt out.

And also: does your validate pipeline execute the tests with bfd and lld?

Well, GHC HQ and GHCup have always had very different priorities. I care about reliability and distro defaults more than about performance.

We place a similarly high value on reliability; afterall linker bugs tend to be extremely subtle and difficult to diagnose.

However, despite this, I have not seen any issues in ld.gold (beyond those that configure already checks for) which would outweigh the link-time benefits that it provides.

Now the question is just, if you support ld.bfd at all. If you do, I will go ahead with disabling ld override. If not, I would expect GHC to force ld.gold. Because I don't want to get ghcup users shut down when they file bugs that happen with ld.bfd. That's all I want to know.

Yes, we support use of ld.bfd.

And also: does your validate pipeline execute the tests with bfd and lld?

No, we only test the linker chosen by the configure script in CI.

All linkers have bugs. There is no linker that will work in any configuration that the user throws at it. This includes ld.bfd, which has more than its fair share of bugs, including some truly horrible ones.

The difference between gold and other linkers is that gold has been effectively abandoned upstream.

See: https://en.wikipedia.org/wiki/Gold_(linker)

I have been struggling to debug GHC on 32-bit PowerPC for several weeks now only to realize the problem was that GHC enforces gold by default despite being configured with --disable-ld-override.

I have been struggling to debug GHC on 32-bit PowerPC for several weeks now only to realize the problem was that GHC enforces gold by default despite being configured with --disable-ld-override.

Can you explain this bug @trac-glaubitz ?

Are you saying that you passed --disable-ld-override and the compiler still used -fuse-ld=gold? That doesn't seem possible to me from looking at the code and a different bug to the one in this ticket.

Passing --disable-ld-override will use bfd during the GHC build itself. However the settings file in /usr/lib/ghc/lib/settings still ends up with "C compiler link flags", "-fuse-ld=gold" which is what breaks GHC for me when actually trying to use it.

See: #24986

@trac-glaubitz how did you install GHC? Did you create a bindist after compilation and installed that? The --disable-ld-override also has to be passed to the bindist configure script.

I am building the Debian package which uses the debian/rules to configure and build GHC.

See: https://sources.debian.org/src/ghc/9.4.7-5/debian/rules/

Then you probably need to add it to EXTRA_INSTALL_CONFIGURE_FLAGS.

My experience tells me bfd is a reliable linker in almost all cases, even if a bit slow.

I'd also argue that you'd want to use bfd for gcc based toolchains and lld if your toolchain is llvm based, just for consistency.

https://sourceware.org/bugzilla/show_bug.cgi?id=16177 looks like the result of our llvm backend + mangler?

My experience tells me bfd is a reliable linker in almost all cases, even if a bit slow.

In my experience most of these linkers are reliable in most cases. However, each linker also has cases where it will fail. Consequently, we have to make a cost-benefit analysis: what is the likelihood that a link will fail compared to the benefit of considerably faster link times. Such an analysis is what lead to the status quo.

https://sourceware.org/bugzilla/show_bug.cgi?id=16177 looks like the result of our llvm backend + mangler?

It is a result of tables-next-to-code, not the mangler. While at one point the mangler was used to implement TNTC, this hasn't been the case for several years.

However, this sort of consideration is precisely why it is hard to extrapolate a linker's reliability from its performance on code from C-like compilers: GHC produces qualitatively different code to what such a compiler would produce and will consequently encounter different bugs.

However, this sort of consideration is precisely why it is hard to extrapolate a linker's reliability from its performance on code from C-like compilers: GHC produces qualitatively different code to what such a compiler would produce and will consequently encounter different bugs.

Fair enough.

However you've repeatedly indicated that the choice of ld.gold is more about performance and not about reliability.

The ld.bfd bug seems to only be triggered on armv7, which isn't even properly supported by GHC anymore (no bindists).

So what I'd be interested in: what is your scoring of linker reliability on the most common platforms regarding GHCs use case (ignoring performance matters).

In general I would agree that ld.bfd on most platforms will exhibit fewer issues than ld.gold on most platforms. However, this is going to be platform dependent and ld.lld is better than either.

It is of course at a user's complete discretion about how you configure the bindist but it seems quite aggressive to universally change the default linker for all GHC versions on all configurations from something which has worked for nearly all users for a long time.

@maerwald Thanks for the links, can you explain what conclusions you can draw from them?

Yes, there are situations where you want to configure GHC to do something different to the default and you can do that by passing --disable-ld-override. You have pointed out a handful of cases where things go wrong when thousands of people install GHC with the default options and things work out fine.

Perhaps your point is instead that users find it surprising that GHC attempts to use gold rather than whatever linker their C compiler is normally using on their system. That I can appreciate could be confusing if you are used to how other software is packaged. However, users who are not knowledgeable linkers are interested in user experience and that generally gold at some stage in the past was more robust and a better choice than bfd if it was available. It's my impression that most haskell programmers don't know anything about linkers nor understand the difference between them.

Are you advocating for making a high-risk change about linker defaults which affects all users because a few users have issues with the default settings? I'm not comfortable estimating the impact of that decision. If you think there is consensus to change the default then feel free to drive that decision forward. By making --disable-ld-override the default in ghcup seems like a way to test out this hypothesis for now and if there is minimal impact then we can also make the change in the default GHC settings.

I think I am just approaching this from a more cautious position than you, modifying linking configuration has proven in the past to be a subtle point of packaging where we have had to be careful.

mentioned in commit b25725ec

closed with commit 32a8103f

mentioned in commit 32a8103f

mentioned in commit 10829530

removed backport needed:9.10 label

removed backport needed:9.8 label

MR now carries relevant backport labels.

mentioned in commit 5ee2a2e6

mentioned in commit af368514

There is probably also quite a large amount of historical baggage here to unpack.

There are historical commits such as

71fcc4c0 - forced gold to be used on certain platforms
0bbc2ac6 - forces cold on aarch64/linux

These commits are made around 2015 when gold was a much newer and maintained project.

Then in 2017 the current behaviour was implemented

625143f4

So it's certainly possible that the situation has changed since 2015 and I imagine that bfd now supports aarch64 due to it's ubiquity now compared to 2015.

No objection here. However, gold seems to be in rather poor shape these days and it's better to use it only on a selected number of architectures. If alternatives are wanted, it might be an idea to look into mold.

mentioned in commit db186943

mentioned in commit 10a6aa18

bindist configure fails when LDFLAGS contains pack-relative-locs

Summary

Steps to reproduce

Expected behavior

Environment

Child items 0

Activity