Reliable, monotonic source positions
The plan outlined in #17544 (closed) assumes that the source positions recorded in SrcSpan
correspond to the positions in the StringBuffer
, and are monotonic. However, the prototype implementation revealed that this is not the case. Consider the test I added recently, haddockA041
, which consists of two files:
-
haddockA041.hs
:{-# LANGUAGE CPP #-} -- | Module header documentation module Comments_and_CPP_include where #include "IncludeMe.hs"
-
IncludeMe.hs
:-- | Comment on T data T = MkT -- ^ Comment on MkT
Loading it in GHCi with -ddump-parsed-ast
shows a very interesting result (note the source locations):
({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:1:1
}
(HsModule
(Just
({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:4:8-31
}
{ModuleName: Comments_and_CPP_include}))
(Nothing)
[]
[({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:1-12
}
(TyClD
(NoExtField)
(DataDecl
(NoExtField)
({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:6
}
(Unqual
{OccName: T}))
(HsQTvs
(NoExtField)
[])
(Prefix)
(HsDataDefn
(NoExtField)
(DataType)
({ <no location info> }
[])
(Nothing)
(Nothing)
[({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
}
(ConDeclH98
(NoExtField)
({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
}
(Unqual
{OccName: MkT}))
({ <no location info> }
(False))
[]
(Nothing)
(PrefixCon
[])
(Nothing)))]
({ <no location info> }
[])))))]
(Nothing)
(Nothing)))
We have locations such as haddockA041.hs:4:8-31
, but also IncludeMe.hs:2:1-12
. That is, not only those locations not monotonic, they even refer to different files!
Of course, this is great for the user, who would rather see the locations before CPP's #include
did its job, but it wreaks havoc on the algorithm implemented in !2377 (closed)
The culprit is, apparently, the support for #line
and {-# LINE ... #-}
pragmas, which allow the user (or CPP) to arbitrarily override the recorded SrcLoc
. The relevant function is setLineAndFile
in Lexer.x
, which performs setSrcLoc
, which throws off the assumptions needed to make #17544 (closed) work.
My plan to fix:
- Record the
StringBuffer
offset inRealSrcLoc
, and use it instead of source locations in !2377 (closed)