... | ... | @@ -85,26 +85,29 @@ An **Anchor** is in fact used in two different ways |
|
|
* Secondly, it serves as a reference point for the parts *inside* `x =
|
|
|
1`. So the distance to the `=` will be based on it.
|
|
|
|
|
|
The **DeltaPos** is what is actually used for the final exact printing
|
|
|
step. It captures the spacing from the current print position on the
|
|
|
page, to the position required for the thing about to be printed.
|
|
|
This is either on the same line in which case is is simply the number
|
|
|
of spaces to emit, or it is some number of lines down, with a given
|
|
|
column offset. The exact printing algorithm keeps track of the column
|
|
|
offset pertaining to the anchor position, so the `deltaColumn` is the
|
|
|
additional spaces to add in this case. The details are presented
|
|
|
below in **TBD**.
|
|
|
When performing exact-printing, the spacing between all elements is
|
|
|
first converted to a series of **DeltaPos**, and printing occurs based
|
|
|
on these delta positions.
|
|
|
|
|
|
The **DeltaPos** captures the spacing from the current print position
|
|
|
on the page to the position required for the thing about to be
|
|
|
printed. This is either on the same line in which case is is simply
|
|
|
the number of spaces to emit, or it is some number of lines down, with
|
|
|
a given column offset. The exact printing algorithm keeps track of
|
|
|
the column offset pertaining to the *current* anchor position, so the
|
|
|
`deltaColumn` is the additional spaces to add in this case. The
|
|
|
details are presented below in **TBD**.
|
|
|
|
|
|
The **anchor_op** is used to facilitate moving an entire AST subtree
|
|
|
into a new location. If it is not moved, the normal case, it will be
|
|
|
`UnchangedAnchor`.
|
|
|
|
|
|
If it has been moved, the actual span captured in the `anchor` field
|
|
|
will no longer be relevant for spacing relative to its context, and
|
|
|
the spacing is provided directly by the `DeltaPos` in the
|
|
|
`MovedAnchor` variant. But its original `anchor` location
|
|
|
remains unchanged, as it is used as a reference for the elements
|
|
|
inside the local definition when printing them.
|
|
|
will no longer be relevant for spacing relative to its parent context,
|
|
|
and the spacing is provided directly by the `DeltaPos` in the
|
|
|
`MovedAnchor` variant. But its original `anchor` location remains
|
|
|
unchanged, as it is used as a reference for the elements inside the
|
|
|
local definition when printing them.
|
|
|
|
|
|
This allows us to painlessly build up new ASTs based on fragments from
|
|
|
anywhere, and we only need to worry about spacing where we actually
|
... | ... | @@ -127,6 +130,10 @@ the `GhcPs` occurs with its own `SrcSpan`, which can be used to tie it |
|
|
up to its matching version in the `GhcRn` AST, and hence ascertain the
|
|
|
corresponding `Name`.
|
|
|
|
|
|
Note: it may be possible to retain some of the original ordering of
|
|
|
declarations by using `AnnSortKey` fields in the `GhcRn` and `GhcTc`
|
|
|
ASTs.
|
|
|
|
|
|
## General approach
|
|
|
|
|
|
The general approach is to store extra information in the
|
... | ... | @@ -193,7 +200,6 @@ Here you can see |
|
|
declaration. Details of the `EpAnnComments` structure and usage are
|
|
|
provided below.
|
|
|
|
|
|
***AZ stopping now 2021-04-20*** will carry on tomorrow.
|
|
|
|
|
|
## Data structures
|
|
|
|
... | ... | @@ -201,145 +207,56 @@ Here you can see |
|
|
keywords (such as `let` or `data`), alphanumeric pseudo-keywords (such as `family` or `qualified`),
|
|
|
symbolic keywords (such as `->` or `;`), and symbolic pseudo-keywords (such as `-<` and `!`).
|
|
|
|
|
|
---------------------------
|
|
|
|
|
|
```haskell
|
|
|
data EpaComment = EpaComment { ac_tok :: EpaCommentTok
|
|
|
, ac_prior_tok :: RealSrcSpan
|
|
|
}
|
|
|
|
|
|
data EpaCommentTok =
|
|
|
-- Documentation annotations
|
|
|
EpaDocCommentNext String -- ^ something beginning '-- |'
|
|
|
| EpaDocCommentPrev String -- ^ something beginning '-- ^'
|
|
|
| EpaDocCommentNamed String -- ^ something beginning '-- $'
|
|
|
| EpaDocSection Int String -- ^ a section heading
|
|
|
| EpaDocOptions String -- ^ doc options (prune, ignore-exports, etc)
|
|
|
| EpaLineComment String -- ^ comment starting by "--"
|
|
|
| EpaBlockComment String -- ^ comment in {- -}
|
|
|
| EpaEofComment -- ^ empty comment, capturing
|
|
|
-- location of EOF
|
|
|
|
|
|
-- | When we are parsing we add comments that belong a particular AST
|
|
|
-- element, and print them together with the element, interleaving
|
|
|
-- them into the output stream. But when editin the AST, to move
|
|
|
-- fragments around, it is useful to be able to first separate the
|
|
|
-- comments into those occuring before the AST element and those
|
|
|
-- following it. The 'EpaCommentsBalanced' constructor is used to do
|
|
|
-- this. The GHC parser will only insert the 'EpaComments' form.
|
|
|
data EpAnnComments = EpaComments
|
|
|
{ priorComments :: ![LEpaComment] }
|
|
|
| EpaCommentsBalanced
|
|
|
{ priorComments :: ![LEpaComment]
|
|
|
, followingComments :: ![LEpaComment] }
|
|
|
deriving (Data, Eq)
|
|
|
|
|
|
type LEpaComment = GenLocated Anchor EpaComment
|
|
|
```
|
|
|
`EpaComment` : Keeps an original comment together with the RealSrcSpan
|
|
|
of the token preceding it, for calculating the spacing when exact printing it.
|
|
|
|
|
|
This stores a comment, differentiating between the different comment styles.
|
|
|
**RAE** what's up with `AnnEofComment`? **End RAE**
|
|
|
**AZ** : `AnnEofComment` is used to keep track of the actual end of the file, so that if there are blank lines at the end we can reproduce them when printing.
|
|
|
`EpAnnComments` : Keeps a list of comments associated with a specific
|
|
|
AST element. Initially this just keeps all comments, but functions
|
|
|
exist in the exact printing library to split this into ones that occur
|
|
|
before and after the AST element, and to move them between elements,
|
|
|
prior to modifying the AST. This facilitates keeping comments
|
|
|
attached to the corrent AST element if the element is moved. See
|
|
|
`balanceComments` and `balanceCommentsList` in the check-exact
|
|
|
Transform module.
|
|
|
|
|
|
**RAE** Why do we need `ac_prior_tok`? The comment is not helpful: *everything* here is about exact-printing. **End RAE**
|
|
|
**AZ**: We need to calculate a `DeltaPos` between every piece of output when printing. It is not always clear what the spacing is before a comment, so the lexer now emits the prior token location as well with a comment, so we can calculate this.
|
|
|
`AddEpAnn` : a container structure tying together an `AnnKeywordId`
|
|
|
with its corresponding `EpaLocation`.
|
|
|
|
|
|
**RAE** Why does `LAnnotationComment` get an `Anchor` not a `SrcSpan`? **End RAE**
|
|
|
**AZ** Everything that can be moved (which is everything) gets an `Anchor`.
|
|
|
An `EpaLocation` is used by the parser to store the original
|
|
|
`RealSrcSpan` belonging to the keyword identified by the
|
|
|
`AnnKeywordId` in an `AddEpAnn`. If tools are used to modify an AST,
|
|
|
the `EpaLocation` can alternatively store a `DeltaPos` directly.
|
|
|
|
|
|
----------------------------------
|
|
|
## Anchor and EpaLocation
|
|
|
|
|
|
```hs
|
|
|
-- | Captures an annotation, storing the @'AnnKeywordId'@ and its
|
|
|
-- location. The parser only ever inserts @'EpaLocation'@ fields with a
|
|
|
-- RealSrcSpan being the original location of the annotation in the
|
|
|
-- source file.
|
|
|
-- The @'EpaLocation'@ can also store a delta position if the AST has been
|
|
|
-- modified and needs to be pretty printed again.
|
|
|
-- The usual way an 'AddApiAnn' is created is using the 'mj' ("make
|
|
|
-- jump") function, and then it can be inserted into the appropriate
|
|
|
-- annotation.
|
|
|
data AddApiAnn = AddApiAnn AnnKeywordId EpaLocation
|
|
|
```
|
|
|
At first blush there seems to be overlap between the `Anchor` and
|
|
|
`EpaLocation` types, and one of them could be redundant.
|
|
|
|
|
|
-------------------------------------
|
|
|
```haskell
|
|
|
data EpaLocation = EpaSpan RealSrcSpan
|
|
|
| EpaDelta DeltaPos
|
|
|
|
|
|
```hs
|
|
|
-- | The anchor for an @'AnnKeywordId'@. The Parser inserts the @'AR'@
|
|
|
-- variant, giving the exact location of the original item in the
|
|
|
-- parsed source. This can be replace by the @'AD'@ version, to
|
|
|
-- provide a position for the item relative to the end of the previous
|
|
|
-- item in the source. This is useful when editing an AST prior to
|
|
|
-- exact printing the changed one.
|
|
|
data EpaLocation = AR RealSrcSpan
|
|
|
| AD DeltaPos
|
|
|
|
|
|
-- | Relative position, line then column. If 'deltaLine' is zero then
|
|
|
-- 'deltaColumn' gives the number of spaces between the end of the
|
|
|
-- preceding output element and the start of the one this is attached
|
|
|
-- to, on the same line. If 'deltaLine' is > 0, then it is the number
|
|
|
-- of lines to advance, and 'deltaColumn' is the start column on the
|
|
|
-- new line.
|
|
|
data DeltaPos =
|
|
|
DP
|
|
|
{ deltaLine :: !Int,
|
|
|
deltaColumn :: !Int
|
|
|
} deriving (Show,Eq,Ord,Data)
|
|
|
|
|
|
-- | An 'Anchor' records the base location for the start of the
|
|
|
-- syntactic element holding the annotations, and is used as the point
|
|
|
-- of reference for calculating delta positions for contained
|
|
|
-- annotations. If an AST element is moved or deleted, the original
|
|
|
-- location is also tracked, for printing the source without gaps.
|
|
|
data Anchor = Anchor { anchor :: RealSrcSpan
|
|
|
-- ^ Base location for the start of
|
|
|
-- the syntactic element holding
|
|
|
-- the annotations.
|
|
|
, anchor_op :: AnchorOperation }
|
|
|
deriving (Data, Eq, Show)
|
|
|
|
|
|
-- | If tools modify the parsed source, the 'MovedAnchor' variant can
|
|
|
-- directly provide the spacing for this item relative to the previous
|
|
|
-- one when printing. This allows AST fragments with a particular
|
|
|
-- anchor to be freely moved, without worrying about recalculating the
|
|
|
-- appropriate anchor span.
|
|
|
data Anchor = Anchor { anchor :: RealSrcSpan, anchor_op :: AnchorOperation }
|
|
|
data AnchorOperation = UnchangedAnchor
|
|
|
| MovedAnchor DeltaPos
|
|
|
deriving (Data, Eq, Show)
|
|
|
```
|
|
|
|
|
|
**RAE** As I commented on the MR, I find `AR` and `AD` shorter than necessary, and I think `DeltaPos` would be better with two constructors. **End RAE**
|
|
|
**RAE** Why do we need both of these types? How are they different? **End RAE**
|
|
|
**AZ**: I am not *sure* that we do. I do know that as explained in **Mechanism** above we need the anchor to have a `RealSrcSpan` and sometimes a `DeltaPos`. An `EpaLocation` only needs to provide the one or the other. But perhaps we can come up with a way of harmonising this.
|
|
|
The difference is that `EpaLocation` is only used for calculating a
|
|
|
`DeltaPos` for a given item in an AST element. As such it needs only
|
|
|
provide the original `RealSrcSpan` for calculating it from the prior
|
|
|
position, or the `Deltapos` directly.
|
|
|
|
|
|
----------------------------------------
|
|
|
An `Anchor` is used both to calculate the `DeltaPos` of the AST
|
|
|
fragment when printing its containing element, and also as a basis for
|
|
|
calculating the elements within it. So the original `RealSrcSpan` is
|
|
|
always required for an `Anchor`, and it optional for the
|
|
|
`EpaLocation`.
|
|
|
|
|
|
```hs
|
|
|
data ApiAnn' ann
|
|
|
= ApiAnn { entry :: Anchor
|
|
|
-- ^ Base location for the start of the syntactic element
|
|
|
-- holding the annotations.
|
|
|
, anns :: ann -- ^ Annotations added by the Parser
|
|
|
, comments :: ApiAnnComments
|
|
|
-- ^ Comments enclosed in the SrcSpan of the element
|
|
|
-- this `ApiAnn'` is attached to
|
|
|
}
|
|
|
| ApiAnnNotUsed -- ^ No Annotation for generated code,
|
|
|
-- e.g. from TH, deriving, etc.
|
|
|
|
|
|
-- | This type is the most direct mapping of the previous API
|
|
|
-- Annotations model. It captures the containing `SrcSpan' in its
|
|
|
-- `entry` `Anchor`, has a list of `AddApiAnn` as before, and keeps
|
|
|
-- track of the comments associated with the anchor.
|
|
|
type ApiAnn = ApiAnn' [AddApiAnn]
|
|
|
```
|
|
|
|
|
|
This is the heart of this design: an `ApiAnn'` stores the `Anchor` for an AST node, along with any contained comments.
|
|
|
In addition, the `anns` field stores the locations for any keywords (like `let` and `in`) associated with an AST
|
|
|
node. **RAE** Some AST nodes (e.g. `HsLet`) get custom data structures for `ann`. Some (e.g. `LazyPat`) get `[AddApiAnn]`. Why
|
|
|
the difference? What's the guiding principle? **End RAE**
|
|
|
**AZ**: The intention is that each gets a custom structure, but this was a big task and I was focusing on making sure the overall approach actually works. Going forward, @int_index has proposed a slightly different mechanism that may make this moot in time.
|
|
|
**RAE** I don't think it's helpful having a reference to the previous model; that will get stale quickly. **End RAE**
|
|
|
**AZ** once the dust settles, I must check in the exact print
|
|
|
algorithm that this is in fact necessary. It is used this way at
|
|
|
present, but perhaps we can make things work with `EpaLocation` only.
|
|
|
|
|
|
**AZ** 2021-04-27 up to here.
|
|
|
|
|
|
-------------------------------------------------
|
|
|
|
... | ... | |