ghc 8.8.3 uses 25-30% more time and memory to derive instances with -XDerivingVia
Summary
We have a large-ish (27K SLOC) "entities" package that consists primarily of domain types and ToJSON
/FromJSON
instances. These instances use Aeson's Generic
functionality to enforce our API conventions. For example:
data Assignment = Assignment
{ assignmentTeacherId :: Maybe TeacherId
, ...
}
deriving stock (Eq, Show, Generic)
instance ToJSON Assignment where
toEncoding = genericToEncoding $ apiOptions $ Just "assignment"
toJSON = genericToJSON $ apiOptions $ Just "assignment"
instance FromJSON Assignment where
parseJSON = genericParseJSON $ apiOptions $ Just "assignment"
apiOptions $ Just "assignment"
means:
- Drop the
"assignment"
prefix from each field selector - For sum types, ensure the value of the
tag
field issnake_cased
(for boring historical reasons)
These hand-written instances have a couple of problems:
- It's easy to forget to implement both
toJSON
andtoEncoding
(the latter is typically 2-3X faster thantoJSON
) - It's easy to use other options that stray from our API's conventions
Recently we attempted to use -XDerivingVia
to avoid these problems:
data Assignment = Assignment
{ assignmentTeacherId :: Maybe TeacherId
, ...
}
deriving stock (Eq, Show, Generic)
deriving via (ApiOptions ('Just "assignment")) instance ToJSON Assignment
deriving via (ApiOptions ('Just "assignment")) instance FromJSON Assignment
where ApiOptions
is a newtype
with the following ToJSON
and FromJSON
instances:
newtype ApiOptions (prefix :: Maybe Symbol) (value :: Type) = ApiOptions
{ unApiOptions :: value
}
instance (GToEncoding Zero (Rep a), GToJSON Zero (Rep a), Generic a, ModifyOptions prefix) => ToJSON (ApiOptions prefix a) where
toJSON = genericToJSON (modifyOptions @prefix defaultOptions) . unApiOptions
toEncoding = genericToEncoding (modifyOptions @prefix defaultOptions) . unApiOptions
instance (GFromJSON Zero (Rep a), Generic a, ModifyOptions prefix) => FromJSON (ApiOptions prefix a) where
parseJSON = fmap ApiOptions . genericParseJSON (modifyOptions @prefix defaultOptions)
After converting some 240 instances, ghc
started running out of memory during continuous integration. On average, we're seeing about a 30% increase in memory usage according to -ddump-timings
:
Allocation, sorted by difference
Stage | -XNoDerivingVia Allocation | -XDerivingVia Allocation | Diff |
---|---|---|---|
Total | 1208967.387 MB | 1707926.976 MB | +498959.589 MB |
Simplifier 3 | 173417.41 MB | 253605.102 MB | +80187.691 MB |
Simplifier 6 | 163104.206 MB | 227728.294 MB | +64624.088 MB |
Simplifier 7 | 126605.672 MB | 188849.772 MB | +62244.1 MB |
Simplifier 4 | 154081.307 MB | 216046.1 MB | +61964.792 MB |
Simplifier 5 | 134668.038 MB | 191559.688 MB | +56891.651 MB |
Simplifier 2 | 127109.857 MB | 174883.096 MB | +47773.239 MB |
Simplifier 1 | 37972.934 MB | 67402.067 MB | +29429.133 MB |
Specialise 1 | 4783.927 MB | 21672.163 MB | +16888.236 MB |
CodeGen 2 | 80904.293 MB | 95021.156 MB | +14116.863 MB |
CodeGen 1 | 77103.084 MB | 90489.74 MB | +13386.655 MB |
Float out 2 | 26748.774 MB | 39731.532 MB | +12982.758 MB |
Demand analysis 1 | 29991.168 MB | 42497.224 MB | +12506.055 MB |
Demand analysis 2 | 26703.106 MB | 38282.119 MB | +11579.013 MB |
Renamer/typechecker 1 | 23239.374 MB | 33166.786 MB | +9927.412 MB |
Float out 1 | 9385.235 MB | 12499.722 MB | +3114.487 MB |
CoreTidy 1 | 4987.757 MB | 6012.39 MB | +1024.633 MB |
Worker Wrapper binds 1 | 845.609 MB | 1158.52 MB | +312.911 MB |
Desugar 1 | 5044.441 MB | 5061.204 MB | +16.763 MB |
Simplify 2 | 366.118 MB | 369.361 MB | +3.243 MB |
Float inwards 1 | 5.076 MB | 5.077 MB | +0.0 MB |
Called arity analysis 1 | 5.419 MB | 5.42 MB | +0.0 MB |
Exitification transformation 1 | 5.308 MB | 5.308 MB | +0.0 MB |
Common sub-expression 1 | 5.101 MB | 5.101 MB | +0.0 MB |
Float inwards 2 | 5.075 MB | 5.075 MB | +0.0 MB |
CorePrep 2 | 116.124 MB | 116.109 MB | -0.015 |
CorePrep 7 | 0.029 MB | 0.012 MB | -0.016 |
CorePrep 6 | 0.044 MB | 0.026 MB | -0.019 |
Simplify 4 | 1.937 MB | 1.899 MB | -0.039 |
Simplify 5 | 0.073 MB | 0.034 MB | -0.039 |
CorePrep 4 | 3.545 MB | 3.498 MB | -0.047 |
ByteCodeGen 4 | 4.768 MB | 4.652 MB | -0.116 |
ByteCodeGen 1 | 7.824 MB | 7.707 MB | -0.117 |
ByteCodeGen 5 | 0.167 MB | 0.05 MB | -0.117 |
CorePrep 1 | 5.339 MB | 5.063 MB | -0.276 |
ByteCodeGen 2 | 903.206 MB | 902.928 MB | -0.278 |
CorePrep 5 | 0.6 MB | 0.132 MB | -0.468 |
CorePrep 3 | 3.628 MB | 3.069 MB | -0.558 |
Simplify 3 | 1.319 MB | 0.262 MB | -1.057 |
Parser 1 | 811.944 MB | 809.403 MB | -2.54 |
ByteCodeGen 3 | 4.204 MB | 0.779 MB | -3.425 |
Simplify 1 | 14.213 MB | 9.337 MB | -4.875 |
Simplify 6 | 0.033 MB | N/A | N/A |
ByteCodeGen 6 | 0.086 MB | N/A | N/A |
CorePrep 8 | 0.016 MB | N/A | N/A |
Note: this table was built by parsing, aggregating, and comparing *.dump-timings
files created by compiling our master
branch as well as the branch that uses -XDerivingVia
. I compiled both branches on my local machine using the following command (-M28G
prevents the operating system killing ghc
at around 30 GB resident memory):
stack build \
--ghc-options '-ddump-to-file -dshow-passes -ddump-timings -j +RTS -s -M28G -A128m -n4m -I0' \
entities
Steps to reproduce
Our entities package is too big (and too proprietary! ToJSON
and FromJSON
instances. You can build it with:
make no.deriving.via
# expands to:
# export STACK_WORK=.stack-work-no-deriving-via
# stack clean oom
# stack build oom --ghc-options "-ddump-to-file -ddump-timings -dshow-passes -j +RTS -s"
and
make deriving.via
# expands to:
# export STACK_WORK=.stack-work-deriving-via
# stack clean oom
# build oom --ghc-options "-DUSE_DERIVING_VIA -ddump-to-file -ddump-timings -dshow-passes -j +RTS -s"
You can observe with htop
that the latter command uses more memory, or compare the *.dump-timings
files in .stack-work-no-deriving-via
and .stack-work-deriving-via
. Note that the README.md
includes some aggregate comparisons (see the make table
rule in the Makefile
), which I'll reproduce here:
Time, sorted by difference
Stage | -XNoDerivingVia Time | -XDerivingVia Time | Diff |
---|---|---|---|
Total | 532.25s | 715.652s | +183.402s |
Simplifier 3 | 89.825s | 121.334s | +31.509s |
Simplifier 6 | 63.525s | 86.922s | +23.397s |
Simplifier 7 | 81.041s | 102.338s | +21.297s |
Simplifier 4 | 83.293s | 103.967s | +20.674s |
Simplifier 5 | 75.801s | 92.826s | +17.025s |
Simplifier 2 | 50.879s | 67.449s | +16.569s |
Simplifier 1 | 10.92s | 26.697s | +15.776s |
Float out 2 | 20.235s | 27.78s | +7.545s |
Demand analysis 2 | 8.95s | 15.716s | +6.766s |
CodeGen 1 | 7.509s | 13.68s | +6.172s |
Demand analysis 1 | 10.729s | 16.558s | +5.828s |
Renamer/typechecker 1 | 10.086s | 13.426s | +3.34s |
CodeGen 2 | 10.22s | 13.238s | +3.018s |
Specialise 1 | 1.933s | 4.633s | +2.7s |
Float out 1 | 4.535s | 6.124s | +1.589s |
CoreTidy 1 | 1.021s | 1.753s | +0.733s |
Exitification transformation 1 | 0.0s | 0.0s | +0.0s |
Float inwards 1 | 0.0s | 0.0s | +0.0s |
Common sub-expression 1 | 0.0s | 0.0s | -0.0 |
Float inwards 2 | 0.0s | 0.0s | -0.0 |
Called arity analysis 1 | 0.0s | 0.0s | -0.0 |
CorePrep 1 | 0.0s | 0.0s | -0.0 |
CorePrep 2 | 0.0s | 0.0s | -0.0 |
Desugar 1 | 0.29s | 0.269s | -0.021 |
Parser 1 | 0.08s | 0.049s | -0.031 |
Worker Wrapper binds 1 | 1.377s | 0.895s | -0.482 |
Allocation, sorted by difference
Stage | -XNoDerivingVia Allocation | -XDerivingVia Allocation | Diff |
---|---|---|---|
Total | 59148.37 MB | 81868.75 MB | +22720.381 MB |
Simplifier 3 | 10241.209 MB | 13633.87 MB | +3392.661 MB |
Simplifier 7 | 8982.909 MB | 11950.616 MB | +2967.707 MB |
Simplifier 4 | 8698.237 MB | 11112.988 MB | +2414.752 MB |
Simplifier 5 | 7963.713 MB | 10345.508 MB | +2381.795 MB |
Simplifier 6 | 7199.82 MB | 9320.079 MB | +2120.259 MB |
Simplifier 2 | 5323.286 MB | 7422.371 MB | +2099.085 MB |
Simplifier 1 | 1254.692 MB | 3028.437 MB | +1773.745 MB |
CodeGen 2 | 1650.152 MB | 2769.932 MB | +1119.779 MB |
CodeGen 1 | 1583.906 MB | 2624.815 MB | +1040.909 MB |
Specialise 1 | 196.157 MB | 1009.708 MB | +813.551 MB |
Renamer/typechecker 1 | 698.728 MB | 1330.049 MB | +631.321 MB |
Float out 2 | 1612.035 MB | 2223.416 MB | +611.382 MB |
Demand analysis 2 | 1486.519 MB | 2047.257 MB | +560.738 MB |
Demand analysis 1 | 1643.384 MB | 2198.509 MB | +555.125 MB |
Float out 1 | 342.7 MB | 505.041 MB | +162.341 MB |
CoreTidy 1 | 168.548 MB | 229.185 MB | +60.637 MB |
Worker Wrapper binds 1 | 43.757 MB | 57.143 MB | +13.386 MB |
Desugar 1 | 22.449 MB | 25.76 MB | +3.312 MB |
Float inwards 1 | 0.162 MB | 0.162 MB | 0.0 |
Called arity analysis 1 | 0.176 MB | 0.176 MB | 0.0 |
Exitification transformation 1 | 0.171 MB | 0.171 MB | 0.0 |
Common sub-expression 1 | 0.163 MB | 0.163 MB | 0.0 |
Float inwards 2 | 0.162 MB | 0.162 MB | 0.0 |
CorePrep 1 | 0.142 MB | 0.142 MB | 0.0 |
CorePrep 2 | 0.142 MB | 0.142 MB | 0.0 |
Parser 1 | 35.051 MB | 32.948 MB | -2.103 |
The linked repo also has some heap profile images (-XNoDerivingVia and -XDerivingVia), though I'm not exactly sure what to make of them.
Expected behavior
I would expect compiling with -XDerivingVia
to use roughly the same amount of memory as without. The derived instances look like this:
data Assignment = Assignment
{ assignmentTeacherId :: Maybe TeacherId
, ...
}
deriving stock (Eq, Show, Generic)
-- deriving via (ApiOptions ('Just "assignment")) instance ToJSON Assignment
instance ToJSON Assignment where
toJSON = coerce
@(ApiOptions ('Just "assignment") Assignment -> Value)
@(Assignment -> Value)
toJSON
toEncoding = coerce
@(ApiOptions ('Just "assignment") Assignment -> Encoding)
@(Assignment -> Encoding)
toEncoding
-- deriving via (ApiOptions ('Just "assignment")) instance FromJSON Assignment
instance FromJSON Assignment where
parseJSON = coerce
@(Value -> Parser (ApiOptions ('Just "assignment") Assignment))
@(Value -> Parser Assignment)
parseJSON
Is it possible that the extra Coercible
instances ghc
is having to create and solve are the culprit?
Environment
- GHC version used: 8.8.3
- Operating System: Ubuntu 18.04.4 LTS
- Processor: Intel Core i7-8750H @2.2Ghz (6 cores, 12 threads)
- Memory: 32GB