|
|
|
# In-tree annotations
|
|
|
|
|
|
|
|
This section describes a new (in 2021) design for tracking enough information in order
|
|
|
|
to replicate the original source from a parsed AST.
|
|
|
|
|
|
|
|
**This is currently just a stub, written by Richard E. as a place to fill out by Alan Z. and others.**
|
|
|
|
Sections delimited by "RAE" are questions that should be answered in the final version of this design.
|
|
|
|
Once such a question is answered, please delete the question.
|
|
|
|
|
|
|
|
## Naming
|
|
|
|
|
|
|
|
Taking an AST tree and converting it into a string that looks like what the user originally wrote
|
|
|
|
is called *exact-printing*. An exact-printed program includes the original spacing and all comments.
|
|
|
|
|
|
|
|
We use the term *API Annotations* to refer to the extra bits of information included in the AST
|
|
|
|
only for the purposes of exact-printing. By definition, API annotations are used only for exact-printing;
|
|
|
|
they are not otherwise consulted during compilation.
|
|
|
|
|
|
|
|
**RAE:** Why are these called API annotations? I would think "keyword annotations" or "exact-printing annotations" (EPAs, for short)
|
|
|
|
would be better. **End RAE**
|
|
|
|
|
|
|
|
## Goals
|
|
|
|
|
|
|
|
The reason to include API annotations is to perform exact-printing, both without and with possible
|
|
|
|
changes to the AST. For example, a tool might want to float out a local definition from a `where` clause
|
|
|
|
to become a top-level definition. It should be possible to do this without disrupting the user's
|
|
|
|
stylistic choices and comments.
|
|
|
|
|
|
|
|
**RAE:** The current design (as witnessed in !2418 on 12 March 2021) allows exact-printing only
|
|
|
|
in the `GhcPs` AST, but not in `GhcRn` or `GhcTc`. Why? Would we never want to exact-print a
|
|
|
|
type-checked AST? **End RAE**
|
|
|
|
|
|
|
|
## General approach
|
|
|
|
|
|
|
|
The general approach is to store extra information in the Trees-That-Grow extension points to
|
|
|
|
each constructor. This extra information includes locations of any keywords used in a construct.
|
|
|
|
For example, in a `HsLet`, we need to store the location of the `let` and the `in`.
|
|
|
|
|
|
|
|
However, we must be careful about exactly how we store the location. A simple fixed row/column
|
|
|
|
will not do, because a construct might be moved before exact-printing. We thus define a new
|
|
|
|
concept of *anchor*: an anchor is **RAE** finish this sentence **End RAE**. Anchors are stored
|
|
|
|
as **RAE** finish this sentence **End RAE**.
|
|
|
|
|
|
|
|
In addition we sometimes must store *deltas*: differences from one location to another. These
|
|
|
|
arise **RAE** finish this sentence **End RAE**.
|
|
|
|
|
|
|
|
Because we must exact-print with comments intact, we track all comments. These are associated
|
|
|
|
with the innermost enclosing AST node -- that is, the one whose `SrcSpan` is smallest, yet includes
|
|
|
|
the comment. They are stored **RAE** where? **End RAE**.
|
|
|
|
|
|
|
|
## Data structures
|
|
|
|
|
|
|
|
`AnnKeywordId`: This is a simple enumeration of all keywords in Haskell, including alphanumeric
|
|
|
|
keywords (such as `let` or `data`), alphanumeric pseudo-keywords (such as `family` or `qualified`),
|
|
|
|
symbolic keywords (such as `->` or `;`), and symbolic pseudo-keywords (such as `-<` and `!`).
|
|
|
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
data AnnotationComment = AnnComment { ac_tok :: AnnotationCommentTok
|
|
|
|
, ac_prior_tok :: RealSrcSpan
|
|
|
|
-- ^ The location of the prior
|
|
|
|
-- token, used for exact printing
|
|
|
|
}
|
|
|
|
|
|
|
|
data AnnotationCommentTok =
|
|
|
|
-- Documentation annotations
|
|
|
|
AnnDocCommentNext String -- ^ something beginning '-- |'
|
|
|
|
| AnnDocCommentPrev String -- ^ something beginning '-- ^'
|
|
|
|
| AnnDocCommentNamed String -- ^ something beginning '-- $'
|
|
|
|
| AnnDocSection Int String -- ^ a section heading
|
|
|
|
| AnnDocOptions String -- ^ doc options (prune, ignore-exports, etc)
|
|
|
|
| AnnLineComment String -- ^ comment starting by "--"
|
|
|
|
| AnnBlockComment String -- ^ comment in {- -}
|
|
|
|
| AnnEofComment -- ^ empty comment, capturing
|
|
|
|
-- location of EOF
|
|
|
|
|
|
|
|
-- | When we are parsing we add comments that belong a particular AST
|
|
|
|
-- element, and print them together with the element, interleaving
|
|
|
|
-- them into the output stream. But when editin the AST, to move
|
|
|
|
-- fragments around, it is useful to be able to first separate the
|
|
|
|
-- comments into those occuring before the AST element and those
|
|
|
|
-- following it. The 'AnnCommentsBalanced' constructor is used to do
|
|
|
|
-- this. The GHC parser will only insert the 'AnnComments' form.
|
|
|
|
data ApiAnnComments = AnnComments
|
|
|
|
{ priorComments :: ![LAnnotationComment] }
|
|
|
|
| AnnCommentsBalanced
|
|
|
|
{ priorComments :: ![LAnnotationComment]
|
|
|
|
, followingComments :: ![LAnnotationComment] }
|
|
|
|
|
|
|
|
type LAnnotationComment = GenLocated Anchor AnnotationComment
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
This stores a comment, differentiating between the different comment styles.
|
|
|
|
**RAE** what's up with `AnnEofComment`? **End RAE**
|
|
|
|
**RAE** Why do we need `ac_prior_tok`? The comment is not helpful: *everything* here is about exact-printing. **End RAE**
|
|
|
|
**RAE** Why does `LAnnotationComment` get an `Anchor` not a `SrcSpan`? **End RAE**
|
|
|
|
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
-- | Captures an annotation, storing the @'AnnKeywordId'@ and its
|
|
|
|
-- location. The parser only ever inserts @'AnnAnchor'@ fields with a
|
|
|
|
-- RealSrcSpan being the original location of the annotation in the
|
|
|
|
-- source file.
|
|
|
|
-- The @'AnnAnchor'@ can also store a delta position if the AST has been
|
|
|
|
-- modified and needs to be pretty printed again.
|
|
|
|
-- The usual way an 'AddApiAnn' is created is using the 'mj' ("make
|
|
|
|
-- jump") function, and then it can be inserted into the appropriate
|
|
|
|
-- annotation.
|
|
|
|
data AddApiAnn = AddApiAnn AnnKeywordId AnnAnchor
|
|
|
|
```
|
|
|
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
-- | The anchor for an @'AnnKeywordId'@. The Parser inserts the @'AR'@
|
|
|
|
-- variant, giving the exact location of the original item in the
|
|
|
|
-- parsed source. This can be replace by the @'AD'@ version, to
|
|
|
|
-- provide a position for the item relative to the end of the previous
|
|
|
|
-- item in the source. This is useful when editing an AST prior to
|
|
|
|
-- exact printing the changed one.
|
|
|
|
data AnnAnchor = AR RealSrcSpan
|
|
|
|
| AD DeltaPos
|
|
|
|
|
|
|
|
-- | Relative position, line then column. If 'deltaLine' is zero then
|
|
|
|
-- 'deltaColumn' gives the number of spaces between the end of the
|
|
|
|
-- preceding output element and the start of the one this is attached
|
|
|
|
-- to, on the same line. If 'deltaLine' is > 0, then it is the number
|
|
|
|
-- of lines to advance, and 'deltaColumn' is the start column on the
|
|
|
|
-- new line.
|
|
|
|
data DeltaPos =
|
|
|
|
DP
|
|
|
|
{ deltaLine :: !Int,
|
|
|
|
deltaColumn :: !Int
|
|
|
|
} deriving (Show,Eq,Ord,Data)
|
|
|
|
|
|
|
|
-- | An 'Anchor' records the base location for the start of the
|
|
|
|
-- syntactic element holding the annotations, and is used as the point
|
|
|
|
-- of reference for calculating delta positions for contained
|
|
|
|
-- annotations. If an AST element is moved or deleted, the original
|
|
|
|
-- location is also tracked, for printing the source without gaps.
|
|
|
|
data Anchor = Anchor { anchor :: RealSrcSpan
|
|
|
|
-- ^ Base location for the start of
|
|
|
|
-- the syntactic element holding
|
|
|
|
-- the annotations.
|
|
|
|
, anchor_op :: AnchorOperation }
|
|
|
|
deriving (Data, Eq, Show)
|
|
|
|
|
|
|
|
-- | If tools modify the parsed source, the 'MovedAnchor' variant can
|
|
|
|
-- directly provide the spacing for this item relative to the previous
|
|
|
|
-- one when printing. This allows AST fragments with a particular
|
|
|
|
-- anchor to be freely moved, without worrying about recalculating the
|
|
|
|
-- appropriate anchor span.
|
|
|
|
data AnchorOperation = UnchangedAnchor
|
|
|
|
| MovedAnchor DeltaPos
|
|
|
|
deriving (Data, Eq, Show)
|
|
|
|
```
|
|
|
|
|
|
|
|
**RAE** As I commented on the MR, I find `AR` and `AD` shorter than necessary, and I think `DeltaPos` would be better with two constructors. **End RAE**
|
|
|
|
**RAE** Why do we need both of these types? How are they different? **End RAE**
|
|
|
|
|
|
|
|
----------------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
data ApiAnn' ann
|
|
|
|
= ApiAnn { entry :: Anchor
|
|
|
|
-- ^ Base location for the start of the syntactic element
|
|
|
|
-- holding the annotations.
|
|
|
|
, anns :: ann -- ^ Annotations added by the Parser
|
|
|
|
, comments :: ApiAnnComments
|
|
|
|
-- ^ Comments enclosed in the SrcSpan of the element
|
|
|
|
-- this `ApiAnn'` is attached to
|
|
|
|
}
|
|
|
|
| ApiAnnNotUsed -- ^ No Annotation for generated code,
|
|
|
|
-- e.g. from TH, deriving, etc.
|
|
|
|
|
|
|
|
-- | This type is the most direct mapping of the previous API
|
|
|
|
-- Annotations model. It captures the containing `SrcSpan' in its
|
|
|
|
-- `entry` `Anchor`, has a list of `AddApiAnn` as before, and keeps
|
|
|
|
-- track of the comments associated with the anchor.
|
|
|
|
type ApiAnn = ApiAnn' [AddApiAnn]
|
|
|
|
```
|
|
|
|
|
|
|
|
This is the heart of this design: an `ApiAnn'` stores the `Anchor` for an AST node, along with any contained comments.
|
|
|
|
In addition, the `anns` field stores the locations for any keywords (like `let` and `in`) associated with an AST
|
|
|
|
node. **RAE** Some AST nodes (e.g. `HsLet`) get custom data structures for `ann`. Some (e.g. `LazyPat`) get `[AddApiAnn]`. Why
|
|
|
|
the difference? What's the guiding principle? **End RAE**
|
|
|
|
**RAE** I don't think it's helpful having a reference to the previous model; that will get stale quickly. **End RAE**
|
|
|
|
|
|
|
|
-------------------------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
-- | The 'SrcSpanAnn\'' type wraps a normal 'SrcSpan', together with
|
|
|
|
-- an extra annotation type. This is mapped to a specific `GenLocated`
|
|
|
|
-- usage in the AST through the `XRec` and `Anno` type families.
|
|
|
|
data SrcSpanAnn' a = SrcSpanAnn { ann :: a, locA :: SrcSpan }
|
|
|
|
deriving (Data, Eq)
|
|
|
|
|
|
|
|
-- | We mostly use 'SrcSpanAnn\'' with an 'ApiAnn\''
|
|
|
|
type SrcAnn ann = SrcSpanAnn' (ApiAnn' ann)
|
|
|
|
|
|
|
|
type LocatedA = GenLocated SrcSpanAnnA
|
|
|
|
type LocatedN = GenLocated SrcSpanAnnN
|
|
|
|
|
|
|
|
type LocatedL = GenLocated SrcSpanAnnL
|
|
|
|
type LocatedP = GenLocated SrcSpanAnnP
|
|
|
|
type LocatedC = GenLocated SrcSpanAnnC
|
|
|
|
|
|
|
|
type SrcSpanAnnA = SrcAnn AnnListItem
|
|
|
|
type SrcSpanAnnN = SrcAnn NameAnn
|
|
|
|
|
|
|
|
type SrcSpanAnnL = SrcAnn AnnList
|
|
|
|
type SrcSpanAnnP = SrcAnn AnnPragma
|
|
|
|
type SrcSpanAnnC = SrcAnn AnnContext
|
|
|
|
|
|
|
|
-- | General representation of a 'GenLocated' type carrying a
|
|
|
|
-- parameterised annotation type.
|
|
|
|
type LocatedAn an = GenLocated (SrcAnn an)
|
|
|
|
```
|
|
|
|
|
|
|
|
**RAE** I'm completely lost here, even after reading Note [XRec and Anno in the AST]. I need some examples
|
|
|
|
of concrete user-written syntax that would necessitate this design. **End RAE**
|
|
|
|
|
|
|
|
---------------------------------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
-- | Captures the location of punctuation occuring between items,
|
|
|
|
-- normally in a list. It is captured as a trailing annotation.
|
|
|
|
data TrailingAnn
|
|
|
|
= AddSemiAnn AnnAnchor -- ^ Trailing ';'
|
|
|
|
| AddCommaAnn AnnAnchor -- ^ Trailing ','
|
|
|
|
| AddVbarAnn AnnAnchor -- ^ Trailing '|'
|
|
|
|
| AddRarrowAnn AnnAnchor -- ^ Trailing '->'
|
|
|
|
| AddRarrowAnnU AnnAnchor -- ^ Trailing '->', unicode variant
|
|
|
|
|
|
|
|
-- | Annotation for items appearing in a list. They can have one or
|
|
|
|
-- more trailing punctuations items, such as commas or semicolons.
|
|
|
|
data AnnListItem
|
|
|
|
= AnnListItem {
|
|
|
|
lann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
**RAE** What is "trailing" about this bit? **End RAE**
|
|
|
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
**RAE** There follows a number of data structures that look like they are engineered for specific use cases (e.g. lists, contexts). Why are they in Annotation.hs instead of closer to their usage sites?
|
|
|
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
```hs
|
|
|
|
-- | API Annotations for a 'RdrName'. There are many kinds of
|
|
|
|
-- adornment that can be attached to a given 'RdrName'. This type
|
|
|
|
-- captures them, as detailed on the individual constructors.
|
|
|
|
data NameAnn
|
|
|
|
-- | Used for a name with an adornment, so '`foo`', '(bar)'
|
|
|
|
= NameAnn {
|
|
|
|
nann_adornment :: NameAdornment,
|
|
|
|
nann_open :: AnnAnchor,
|
|
|
|
nann_name :: AnnAnchor,
|
|
|
|
nann_close :: AnnAnchor,
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
-- | Used for @(,,,)@, or @(#,,,#)#
|
|
|
|
| NameAnnCommas {
|
|
|
|
nann_adornment :: NameAdornment,
|
|
|
|
nann_open :: AnnAnchor,
|
|
|
|
nann_commas :: [AnnAnchor],
|
|
|
|
nann_close :: AnnAnchor,
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
-- | Used for @()@, @(##)@, @[]@
|
|
|
|
| NameAnnOnly {
|
|
|
|
nann_adornment :: NameAdornment,
|
|
|
|
nann_open :: AnnAnchor,
|
|
|
|
nann_close :: AnnAnchor,
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
-- | Used for @->@, as an identifier
|
|
|
|
| NameAnnRArrow {
|
|
|
|
nann_name :: AnnAnchor,
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
-- | Used for an item with a leading @'@. The annotation for
|
|
|
|
-- unquoted item is stored in 'nann_quoted'.
|
|
|
|
| NameAnnQuote {
|
|
|
|
nann_quote :: AnnAnchor,
|
|
|
|
nann_quoted :: SrcSpanAnnN,
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
-- | Used when adding a 'TrailingAnn' to an existing 'LocatedN'
|
|
|
|
-- which has no Api Annotation (via the 'ApiAnnNotUsed' constructor.
|
|
|
|
| NameAnnTrailing {
|
|
|
|
nann_trailing :: [TrailingAnn]
|
|
|
|
}
|
|
|
|
|
|
|
|
-- | A 'NameAnn' can capture the locations of surrounding adornments,
|
|
|
|
-- such as parens or backquotes. This data type identifies what
|
|
|
|
-- particular pair are being used.
|
|
|
|
data NameAdornment
|
|
|
|
= NameParens -- ^ '(' ')'
|
|
|
|
| NameParensHash -- ^ '(#' '#)'
|
|
|
|
| NameBackquotes -- ^ '`'
|
|
|
|
| NameSquare -- ^ '[' ']'
|
|
|
|
```
|
|
|
|
|
|
|
|
**RAE**
|
|
|
|
* What is `(bar)` in the example of `NameAnn`? Did you mean `(+)`?
|
|
|
|
* What if a name doesn't have an open/close component? What do those fields of `NameAnn` contain?
|
|
|
|
* What is `nann_trailing` doing here?
|
|
|
|
* Why is `->` special? And isn't it spelled `(->)` when used as an identifier?
|
|
|
|
* `NameAdornment` always seems too big. That is, I don't think there's an occurrence that could use all four of its constructors.
|
|
|
|
** End RAE **
|
|
|
|
|
|
# This is a decription of the API Annotations introduced with GHC 7.10 RC2
|
|
# This is a decription of the API Annotations introduced with GHC 7.10 RC2
|
|
|
|
|
|
|
|
|
... | | ... | |