I've tried to compile OpenGLRaw v3.3.4.0 (cabal install OpenGL-3.0.2.2) using GHC and Cabal, with split-sections: True enabled on cabal/config in Windows/amd64 v10, but the process fails with the following message (similar in both versions of GHC/Cabal used):
[...][617 of 617] Compiling Graphics.GL ( src\Graphics\GL.hs, dist\build\Graphics\GL.o )C:\Program Files\GHC-8.6.5\lib\../mingw/bin\ld.exe: dist\build\HSOpenGLRaw-3.3.4.0-LbLzAQRjVYc4uE4RgrZjxe.o: too many sections (67760)C:\Program Files\GHC-8.6.5\lib\../mingw/bin\ld.exe: final link failed: File too bigcabal: Leaving directory 'C:\Users\<user>\AppData\Local\Temp\cabal-tmp-3204\OpenGLRaw-3.3.4.0'cabal.exe: Error: some packages failed to install:GLURaw-2.0.0.4-10KgZckuPJf43gcTUy3nI3 depends on GLURaw-2.0.0.4 which failed to install.OpenGL-3.0.2.2-OUWLGivwn055CFreNyTxD depends on OpenGL-3.0.2.2 which failed to install.OpenGLRaw-3.3.4.0-LbLzAQRjVYc4uE4RgrZjxe failed during the building phase. The exception was:ExitFailure 1
The same problem also appears using the last Haskell Platform installer (HaskellPlatform-8.6.5-core-x86_64-setup.exe).
The PE format supports 2^16 sections but I believe the big-obj extension raises this to 2^32 (see Note [Produce big objects on Windows] in GHC.Driver.Pipeline.Execute). I suspect that the problem here is that we aren't using the big-obj format when joining objects to produce the GHCi object.
I suspect that the problem here is that we aren't using the big-obj format when joining objects to produce the GHCi object.
Aren't the sections merged during this? That's what the linker script is for no?
Do we actually still need the ghci objects? It's far more efficient to load the archive file which has a gnuindex than the monolithic ghci object file, where the linker is forced to process the entire thing should any symbol be needed..
The GHC versions this is reported against are quite old though..
Do we actually still need the ghci objects? It's far more efficient to load the archive file which has a gnuindex than the monolithic ghci object file, where the linker is forced to process the entire thing should any symbol be needed..
While you are right that an archive allows more laziness, that doesn't necessarily mean that it's more efficient overall. With a monolithic object much of the linking between objects can already be performed ahead-of-time during the merge. Consequently, if you end up using a sufficiently large fraction of the linked archive, you may do significantly more work loading the archive than the pre-linked object. I believe this is why @simonmar introduced the GHCi object file mechanism and it is likely still used at Facebook today for this reason.
However, a rather unscientific experiment on QuickCheck and Cabal shows that (at least in this one case) there isn't that great of a difference:
Ahh, I think I actually have the reasoning wrong here. GHCi objects do not save relocations. Rather they save section mappings. Specifically, enabling function-section-splitting will result in object files which require a significant amount of work for GHC to load (since every section needs to be mapped individually). The GHCi library merges all of these into a single section and is consequently considerably less work to load (although you indeed do lose the ability to lazily resolve individual objects).
The GHCi library merges all of these into a single section and is consequently considerably less work to load
While you do save on section mapping, the saving is only meaningful should the sections be needed. Base's cross contamination with other packages like the rts means that forcing the load of base forces the rts as well. Or when using libghc you sacrifice quite a lot of startup time and memory for perhaps calling a very small percentage of your loaded objects.
Add to this that all those symbols have to be relocated in the lower 32bit address space you end up consuming lots of valuable memory space. For the cabal case, that's more than half a million relocations to be processed.
The real question is, how much of these object files are actually used at once.
i.e. How much work do we do that isn't needed. If the large majority of the file is needed sure, but base has enough symbols to not fit in a single DLL. I really wonder whether it all needs to be resolved for all programs.
So sure I buy that they are less work to load sections, but wonder if that's not off setted by the work to load and resolve the symbols. It definitely less sections to load, but significantly more up front processing costs and can force you to resolve dependencies.
You can also significantly reduce the overhead of the sections by producing a normal archive and a split sections archive. At least then the separation of functionality still exists but you significantly reduce the number of sections while still having lazy loading and resolving. Though you will eat more disk space.
The real question is, how much of these object files are actually used at once.
I hope @simonmar can comment on more of the motivation for why GHCi objects are necessary. I believe in the Facebook use-case
You can also significantly reduce the overhead of the sections by producing a normal archive and a split sections archive. At least then the separation of functionality still exists but you significantly reduce the number of sections while still having lazy loading and resolving. Though you will eat more disk space.
Yes, this is true. For most users I think producing a non-split-sections archive would be a better solution (since it preserves lazy loading) and wouldn't have used any more disk space than the single-object approach. That being said, I suspect Facebook's use-case (which, IIRC, is what motivated this addition) might be a bit different as they have many small modules.
That being said, I suspect Facebook's use-case (which, IIRC, is what motivated this addition) might be a bit different as they have many small modules.
I can certainly believe that! It's an interesting conflicting requirement.
and wouldn't have used any more disk space than the single-object approach
You do use slightly more since each object file in the archive will cary an object file header and padding between the header and first section. But I expected this to be minimal :)
My vague recollection is that loading libraries in GHCi was slow (like, very very slow) with split-sections. Archives without split-sections were only a little slower than the GHCi object.
There was also some interaction with profiling - maybe it was that we don't build a GHCi object for profiling, so loading profiling libraries into GHCi (e.g. via iserv or via using the GHC API with -prof) was very slow. Did we turn off split-objects with profiling for this reason?
If the performance characteristics have changed then we can definitely revisit this. cc @josefs
My initial suspicion here was that --oformat=pe-bigobj-x86-64 wasn't making it into the arguments used to invoke the linker invocation for object merging. However, there appears to be configure does appear to have the right logic. Moreover ghc --info does indeed show the expected result.
However, the failing invocation of ld appears to rather be from Cabal, which tries to implement its own object joining logic (namely Distribution.Simple.Program.Ld.combineObjectFiles).
Thanks! Coincidentally, didn't you open the same bug 2 years ago? Ah my memory didn't fail... You did open https://github.com/haskell/cabal/issues/6338 because I remember the same issue happening before :)
I believe the problem in #15524 is that Cabal doesn't merge objects correctly. Specifically, it doesn't ensure that text sections are merged. This is yet another argument for !7031 (closed) .
I have also opened #20712 (closed), which suggests adding a new object-joining mode to GHC, allowing Cabal to delegate the linking to GHC instead of replicating the logic itself.