Better code patterns for Binary instance
Hand writing Binary instances is always fickle as bugs introduced are horrible to debug.
It's one of the reasons why in the past it was proposed that we should derive them instead: #22054 However there are cases where derived instances don't optimize as well (#21938) so this caused runtime regressions which isn't acceptable for code that is so relevant for performance. And even if we fix the runtime aspect it's unclear if we want to re-generate these instances from the Generics instance each time GHC get's compiled. After all GHC get's compiled a lot.
We will hopefully fix the runtime perf difference eventually, and might improve the compile time for the instances at some point. But for now the best we can do is writing these instances by hand.
Given all that:
How can we make writing Binary instances less error prone.
A common definition looks like this:
instance Binary IfaceConDecls where
put_ bh IfAbstractTyCon = putByte bh 0
put_ bh (IfDataTyCon False cs) = putByte bh 1 >> put_ bh cs
put_ bh (IfDataTyCon True cs) = putByte bh 2 >> put_ bh cs
put_ bh (IfNewTyCon c) = putByte bh 3 >> put_ bh c
get bh = do
h <- getByte bh
case h of
0 -> return IfAbstractTyCon
1 -> liftM (IfDataTyCon False) (get bh)
2 -> liftM (IfDataTyCon True) (get bh)
3 -> liftM IfNewTyCon (get bh)
_ -> error "Binary(IfaceConDecls).get: Invalid IfaceConDecls"
The bar far most common error I've seen others make or made myself is the the "tag byte" wrong. Usually that's done through copy one of the existing cases and forgetting to adjust the tag byte.
For the example above that would look like:
instance Binary IfaceConDecls where
put_ bh IfAbstractTyCon = putByte bh 0
put_ bh (IfDataTyCon False cs) = putByte bh 1 >> put_ bh cs
put_ bh (IfDataTyCon True cs) = putByte bh 2 >> put_ bh cs
put_ bh (IfNewTyCon c) = putByte bh 3 >> put_ bh c
put_ bh (IfNewNewTyCon c) = putByte bh 3 >> put_ bh c
get bh = do
h <- getByte bh
case h of
0 -> return IfAbstractTyCon
1 -> liftM (IfDataTyCon False) (get bh)
2 -> liftM (IfDataTyCon True) (get bh)
3 -> liftM IfNewTyCon (get bh)
_ -> error "Binary(IfaceConDecls).get: Invalid IfaceConDecls"
Once spotted the issue is obvious. But I made mistakes like that in the past and I've seen others make them as well.
What can we do about it.
Relying on dataToTag#
instance Binary IfaceConDecls where
put_ bh x = case x of
IfAbstractTyCon -> putConTag x bh
IfDataTyCon False cs -> putConTag x bh >> put_ bh cs
IfDataTyCon True cs -> putConTag x bh >> put_ bh cs
IfNewTyCon c -> putConTag x bh >> put_ bh c
IfNewNewTyCon c -> putConTag x bh >> put_ bh c
get bh = do
h <- getConTag bh
case h of
0 -> return IfAbstractTyCon
1 -> liftM (IfDataTyCon False) (get bh)
2 -> liftM (IfDataTyCon True) (get bh)
3 -> liftM IfNewTyCon (get bh)
_ -> error "Binary(IfaceConDecls).get: Invalid IfaceConDecls"
Now the tag byte would always be correct by construction. The downside being that it wouldn't work for pattern synonyms since dataToTag# looks through these. But for the common case I imagine it would still be a good improvement.
If you have other ideas for more general and less error prone solutions which still give the same runtime performance please put them down. I feel like this is a common pitfall people hit sooner or later. And it would be good to improve on the status quo here.