GHC Large Data codegen very excessive memory
Summary
When creating a specific function using a very large data structure, ghc's CodeGen balloons very excessively in memory, even though the code is fairly simple in structure.
Specifically, the code given below is the output of a TemplateHaskell program. I create a type W500 which is 500 Bit
's in a row. If, for example, we want to write an adder function for two W500's, we can generate something looking like the carryadd
function given. In our actual application, we use W1000's and larger, which makes the problem much much worse
Steps to reproduce
Given the following (generated!) haskell code: https://gist.github.com/Torrencem/f0805afd4f06a540e5d6168e03302751
Compile the program with -v, and see that the CodeGen step uses 10.7 GB of memory. This is much more than is used for similar functions. When a similar function is generated for words of 1000 bits (W1000), ghc uses 165 GB (?!) of memory:
...
!!! CorePrep [file]: finished in 1163.01 milliseconds, allocated 1498.828 megabytes
*** Stg2Stg:
*** CodeGen [file]:
!!! CodeGen [file]: finished in 92028.79 milliseconds, allocated 163195.916 megabytes
writeBinIface: 490 Names
writeBinIface: 4746 dict entries
*** systool:as:
*** Assembler:
gcc -iquoteOtherFile -I.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/[proj]/[proj]-tmp -I.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/[proj]/[proj]-tmp -I.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/[proj]/autogen -I.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/global-autogen -I.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/[proj]/[proj]-tmp -no-pie -fno-PIC -x assembler -c /tmp/ghc3622375_0/ghc_4.s -o .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/build/[proj]/[proj]-tmp/WAll.o.tmp
!!! systool:as: finished in 589.56 milliseconds, allocated 0.142 megabytes
*** CorePrep [file]:
Result size of CorePrep
= {terms: 866,664,
types: 372,896,
coercions: 39,
joins: 5,308/24,326}
!!! CorePrep [file]: finished in 1539.16 milliseconds, allocated 1498.445 megabytes
*** Stg2Stg:
*** CodeGen [file]:
!!! CodeGen [file]: finished in 96087.78 milliseconds, allocated 165661.267 megabytes
writeBinIface: 490 Names
writeBinIface: 4746 dict entries
*** systool:as:
...
(output from a run with W1000 instead of W500 on our actual project)
The carryadd algorithm given is a fairly simple function which adds two 500 bit numbers bit by bit, keeping track of a carry as it goes
Expected behavior
We were expecting the given code sample, and the version with W1000, to compile fairly easily, without taking tens of minutes and hundreds of GB of memory. By hand implementations of the same exact algorithm, which break up W1000 into smaller W50's, for example, do not have this compilation issue
Environment
- GHC version used: The Glorious Glasgow Haskell Compilation System, version 8.10.5
Optional:
- Operating System: Ubuntu 20
- System Architecture: 64 bit