Reliable, monotonic source positions
The plan outlined in #17544 (closed) assumes that the source positions recorded in SrcSpan correspond to the positions in the StringBuffer, and are monotonic. However, the prototype implementation revealed that this is not the case. Consider the test I added recently, haddockA041, which consists of two files:
-
haddockA041.hs:{-# LANGUAGE CPP #-} -- | Module header documentation module Comments_and_CPP_include where #include "IncludeMe.hs" -
IncludeMe.hs:-- | Comment on T data T = MkT -- ^ Comment on MkT
Loading it in GHCi with -ddump-parsed-ast shows a very interesting result (note the source locations):
({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:1:1
}
(HsModule
(Just
({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:4:8-31
}
{ModuleName: Comments_and_CPP_include}))
(Nothing)
[]
[({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:1-12
}
(TyClD
(NoExtField)
(DataDecl
(NoExtField)
({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:6
}
(Unqual
{OccName: T}))
(HsQTvs
(NoExtField)
[])
(Prefix)
(HsDataDefn
(NoExtField)
(DataType)
({ <no location info> }
[])
(Nothing)
(Nothing)
[({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
}
(ConDeclH98
(NoExtField)
({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
}
(Unqual
{OccName: MkT}))
({ <no location info> }
(False))
[]
(Nothing)
(PrefixCon
[])
(Nothing)))]
({ <no location info> }
[])))))]
(Nothing)
(Nothing)))
We have locations such as haddockA041.hs:4:8-31, but also IncludeMe.hs:2:1-12. That is, not only those locations not monotonic, they even refer to different files!
Of course, this is great for the user, who would rather see the locations before CPP's #include did its job, but it wreaks havoc on the algorithm implemented in !2377 (closed)
The culprit is, apparently, the support for #line and {-# LINE ... #-} pragmas, which allow the user (or CPP) to arbitrarily override the recorded SrcLoc. The relevant function is setLineAndFile in Lexer.x, which performs setSrcLoc, which throws off the assumptions needed to make #17544 (closed) work.
My plan to fix:
- Record the
StringBufferoffset inRealSrcLoc, and use it instead of source locations in !2377 (closed)