Split configure script

Motivation

We're getting pretty close to being done making GHC multi-target. One consequence of this is that GHC itself won't depend on the vast majority of the @vars@ from configure. Only src-determined things like @ProjectVersionMunged@, @LlvmVersion@, etc. are still needed project-wide.

What I propose then is we strip down the root configure.ac to just substitute those platform-agnostic variables with no dynamic checks, and push the vast majority of the configure checks into package-specific configure scripts rts, inter-gmp, and base. The settings file does depend on a number of configure checks, but it can be either an extra file installed by the rts, which makes sense is it basically is saying how to produce Haskell which works with a given RTS, or get it's own configure script. As usual, the m4 directory means we an share as much logic as we like between these scripts.

This has a number of benefits:

Running top-level configure should only make changes that are "safe for sdist". E.g. we can put all the versions in the cabal files and then sdist each cabal package. C.f. !5965 (merged).
Prevents regressions with multi-target: Compiler doesn't know about the target platform by construction, so it cannot be biased towards one or another.
Slightly more parallelism: we can immediately start building the compiler.
Better incremental builds: Changing configure results will no longer invalidate the stage 0 compiler build via ghcautoconf.h. More broadly each package only sees the options it cares about, so smaller incremental benefits for the other libs.
One step closer to cabal build ghc for a stage 0 compiler. Running the meta-configure is trivial preprocessor step that also could be done in ./boot instead of autoconf as follow up work. Then the only thing blocking the cabal-built compilers is code gen executables genprimmops. But we have build-tool-depends already to hack something up, and longer term I hope https://github.com/ghc-proposals/ghc-proposals/pull/243 means it can all become TH.

I'll try to avoid controversy by not including getting rid of autoconf in the above list :). My view is this split delivers most of the benefit of that, and right now configure.ac is too big and monolithic to assess how much we depend on autoconf and where. With this change any renewed talk on purging autoconf will be a better evidence-based discussion.

Things that get configured today

Good to keep in mind.

ghcautoconf.h, from AC_DEFINE, via mk/config.h
settings files per stage, originally directly via type level configure, now indirectly via make/hadrian
Build system config files, from AC_SUBST:
- hadrian/cfg/system.config
- mk/config.mk

Roadmap

Prep Make build system

It turns out there was a few things to do with the make build system to make it better cope with the RTS having a configure script, and generally anticipate the goals here.

Prep Hadrian changes

!6953 (closed) Hadrian: bring up to date with latest make improvements
!6978 (closed)

Prep Autoconf changes

!6828 (closed) modularize platform detection.
!6836 (closed) Separate some AC_SUBST / AC_DEFINE
!6927 (closed) Factor out unregisterised and tables next to code m4 macros
!6931 (closed) Factor out more m4 macros
!6964 (closed)

Other prep

!6216 (closed) Move /includes to /rts/include, sort per package better
!6839 (closed) Avoid GHC_STAGE and other include bits
!6791 (closed) / !6920 (closed) Compiler is target agnostic!
!6963 (closed) Generate ghcversion.h with the top-level configure
!6987 (merged)
!7100 (closed)
!9627 (closed) Remove hack for building RTS that gets in the way of build-type: Simple.

Intro RTS Configure script

A beachhead!

!6822 (closed) (optional intermediate step) do blank RTS configure script before moving anything over.
!9760 (closed) Bump Cabal so configure script can detect cabal flags.
!9756 (closed) Move extern symbols logic

Headers generated RTS side.

Hadrian can still tell RTS what the configuration is (the RTS configure script wont't yet decide for itself), but ghcautoconf.h and ghcplatform.h are made in the RTS configure script. mk/config.h can be deleted.

Rest of RTS-oriented configure logic.

The RTS is able to figure out its configuration without relying on correctly-set manual Cabal flags.

Perhaps the RTS will contain some settings info that Hadrian reads (c.f. #22686 (closed)) i.e. it could be a source of truth.

Slim down Cabal Flags for RTS

With the logic from before moved into the RTS configure script, there is often no need to make decision from the outside --- just let the RTS configure script make its own private choice.

We can do that and get rid of a bunch of Hadrian logic.

!9269 (closed) / wip/rts-configure-scrap-cabal-flags

Get rid of rest of configure script

See

`genapply` shouldn't take RTS headers at compile time

If the RTS is built standalone, there it will be annoying to build genapply at RTS configure time. We should do something different -- like make genapply take the info at runtime so we can use a prebuilt genapply.

Create configure script for settings file.

Many decisions however need to effect both RTS and settings file. At a minimum, we can share m4 logic between projects, but we may also want to make (part of) the settings file during the RTS build.

Some settings effectively indicate choices where the RTS and GHC must agree, so deciding twice and hoping the system-snooping is deterministic is sketch. Other settings like what tools to use are naturally independent.

Note it is precisely second sort that the bindist will configure today.

Possible prepare stage-wide configure more like status-quo-ante for hadrian/make

While it's important than components can be configured separately to untangle our current mess, hadrian/make might continue to want to make some settings across projects. Concretely, something need to fill in their settings input files, hadrian/cfg/system.config, and mk/config.mk.

We can have our cake and eat it too by keeping enough logic in m4 files to create a "stage-wide" configure script.

(Remember, per https://gitlab.haskell.org/ghc/ghc/-/wikis/cross-compilation/roadmap bootstrapping is an infinite tree to explore, not a single chain, let alone a finite 3 stage chain! So stage-wide config files vs one master config files makes much more sense as input to Hadrian/Make.)

Additionally, we should enhance the autoconf macros to ensure every decision solved by the "outer" configure is not re-decided by the per-package configures. That means adding enough configure flags so the decisions can be "told" by make/hadrian to RTS/base/unix/whatever.

This last step might sound like it is undermining the whole project: what's the point of moving all the logic to per-package configure scripts if we are just going to centrally configure the decisions anyways?! Remember, only make/hadrian parameters that are either inspected/eliminated by the outer build system or "aliased" to multiple packages need be in the outer configure. The vast majority of stuff is just needed by on package (usually the RTS), or was AC_DEFINED and already bypasses the output build system going straight from autoconf to the headers. All such things need never be configured from the stage-wide configure script, and should purely be per-stage.

Also, downstream packaging like Haskell.nix or Nixpkgs will probably stop using Make/Hadrian, and thus bypass any stage-wide configuring, but will use the per-package configures. In general "whole compilers" should be thought of as mini-distros and not single packages, and thus our Make/Hadrian are more like package managers / multiple monorepo build systems (like the original BSD monorepo).

CC @angerman @alp @bgamari @hvr @nomeata @snowleopard @hsyl20 @nrnrnr

Edited Oct 23, 2023 by John Ericson

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information