Support for Generic Ghc Api Ast Traversals
This Wiki page is a spin off of GhcApiStatus, focussing on information related to generic traversals (transformations/queries) over the Ghc Ast types.
Data.Traversable
David Waern has been working on deriving Data.Traversable
for the Ast types, to get mapM
for use in Haddock 2. He points out that, by default, SYB-style traversals do not seem to support type-changing maps.
Data.Generics
Since Ghc does support (standalone) deriving of Data
and Typeable
, it seems natural to want those classes instantiated for the Ghc Api types. In case that would lead to blowup of standard releases, it would be even better if those instances could be provided as an add-on to a release.
-
for recent code/documentation, see the syb-utils darcs repo (rss feed of darcs changes):
darcs get http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/syb-utils
update (30/01/2009): by now, most code of syb-utils
has moved to maintained packages, either `syb` or `ghc-syb`.
-
standalone deriving of
Data
fails if data constructors are not in scope - this means thatData
instances are in conflict with the intended data abstraction for some Ghc Ast types! -
using the internal CPP hackery of
Data.Generics
forTypeable
, standalone deriving ofData
for non-abstract types, and the traditional manual dummyData
instances for abstract types, it seems we can get initialData.Generics
support over the Ghc Ast types! -
since I want to be able to identify the abstract types, I've replaced the traditional
toConstr _ = error "toConstr"
:abstractConstr n = mkConstr (abstractDataType n) ("{abstract:"++n++"}") [] Prefix abstractDataType n = mkDataType n [abstractConstr n] -- TypeableINSTANCE_TYPEABLE0(SrcSpan,srcSpanTc,"SrcSpan") instance Data SrcSpan where toConstr _ = abstractConstr "SrcSpan" gunfold _ _ = error "gunfold" dataTypeOf _ = mkNorepType "SrcSpan"
TODO Is that ok, or is it going to cause problems?
-
for starters, here's a
Data
-based show that shows the constructors/abstract types instead of pretty-printing them:-- generic Data-based show, with special cases for GHC Ast types, -- showing abstract types abstractly and avoiding known potholes showData :: Data a => Stage -> Int -> a -> String showData stage n = generic `ext1Q` list `extQ` string `extQ` fastString `extQ` srcSpan `extQ` name `extQ` occName `extQ` moduleName `extQ` var `extQ` dataCon `extQ` bagName `extQ` bagRdrName `extQ` bagVar `extQ` nameSet `extQ` postTcType `extQ` fixity where generic :: Data a => a -> String generic t = indent n ++ "(" ++ showConstr (toConstr t) ++ space (concat (intersperse " " (gmapQ (showData stage (n+1)) t))) ++ ")" space "" = "" space s = ' ':s indent n = "\n" ++ replicate n ' ' string = show :: String -> String fastString = ("{FastString: "++) . (++"}") . show :: FastString -> String list l = indent n ++ "[" ++ concat (intersperse "," (map (showData stage (n+1)) l)) ++ "]" name = ("{Name: "++) . (++"}") . showSDoc . ppr :: Name -> String occName = ("{OccName: "++) . (++"}") . OccName.occNameString moduleName = ("{ModuleName: "++) . (++"}") . showSDoc . ppr :: ModuleName -> String srcSpan = ("{"++) . (++"}") . showSDoc . ppr :: SrcSpan -> String var = ("{Var: "++) . (++"}") . showSDoc . ppr :: Var -> String dataCon = ("{DataCon: "++) . (++"}") . showSDoc . ppr :: DataCon -> String bagRdrName:: Bag (Located (HsBind RdrName)) -> String bagRdrName = ("{Bag(Located (HsBind RdrName)): "++) . (++"}") . list . bagToList bagName :: Bag (Located (HsBind Name)) -> String bagName = ("{Bag(Located (HsBind Name)): "++) . (++"}") . list . bagToList bagVar :: Bag (Located (HsBind Var)) -> String bagVar = ("{Bag(Located (HsBind Var)): "++) . (++"}") . list . bagToList nameSet | stage `elem` [Parser,TypeChecker] = const ("{!NameSet placeholder here!}") :: NameSet -> String | otherwise = ("{NameSet: "++) . (++"}") . list . nameSetToList postTcType | stage<TypeChecker = const "{!type placeholder here?!}" :: PostTcType -> String | otherwise = showSDoc . ppr :: Type -> String fixity | stage<Renamer = const "{!fixity placeholder here?!}" :: GHC.Fixity -> String | otherwise = ("{Fixity: "++) . (++"}") . showSDoc . ppr :: GHC.Fixity -> String
For example usage, see the attached
APISybTesting
: it parses aTestModule
, prettyprints and shows, does an identity transform, and an example query (extract classes and family declarations). -
some of the predefined instances in
Data.Generics.Instances
are dummies, preventing traversals of substructures or pretending that non-concrete types areData
- this should probably be changed by splitting that module and not re-exporting the dummy part fromData.Generics
by default. See http://www.haskell.org/pipermail/generics/2008-June/000347.html, http://www.haskell.org/pipermail/generics/2008-June/000346.html, http://www.haskell.org/pipermail/generics/2008-June/000342.html for more info. -
it seems as if the issue of type-changing traversals can be addressed within SYB, although those dummy instances cause trouble here.
-
Traversable Functor Data,or: X marks the spot, a technique based on
gfoldl
plus one nasty and two non-nasty uses ofunsafeCoerce
. The nasty use interacts badly with the dummy instances. -
gMap in SYB1, a technique based on
gfoldl
,gunfoldl
, and some Dynamic types in the middle. AvoidsunsafeCoerce
, and uses the fact that the dummy instances are slightly more honest aboutgunfoldl
than aboutgfoldl
, butgMap
cannot distinguish functor parameters from equivalent types (nofmap
) and its type is too general for direct use (runtime type errors instead of compile time type errors) -
http://www.haskell.org/pipermail/generics/2008-July/000351.html, combining the advantages of the previous two techniques to define a
Data
-basedfmap
. The remaining uses ofunsafeCoerce
only distinguish the functor parameter type from equivalent types occuring elsewhere in the functor, so seem to be ok? -
performance: Syb traversals are sometimes associated with bad performance. Unfortunately, none of the anecdotes I'm aware of really investigates these performance issues in any detail (if you have useful data to add here, please do!). So I can only say here that
- there are various more or less obvious ways in which naive use of Syb traversal schemes can lead to very inefficient traversals
- until proven wrong, I'll assume that all of these traps can be avoided by careful tuning of traversal schemes!-)
- three example issues are listed in Data.Generics with GPS (using Maps to avoid getting lost in Data), which also provides a generalisation of the main Uniplate
PlateData
(Uniplate over Syb) optimisation, in a form that can be used to address one of the performance issues ofeverywhere
andeverything
(traversal of irrelevant substructures) - this gives the expected speedup of naive traversal schemes - I've been trying to address another of those three issues (too many runtime type checks, specifically linear lookup), but the overhead of working a map of functions of different types means that this isn't really a win (for the typical few specific cases, linear lookup is faster, and as the number of specific cases grows, they tend to be organised so as to reduce type checks and traversals, so having a map is still not necessarily a win)