Skip to content

Reliable, monotonic source positions

The plan outlined in #17544 (closed) assumes that the source positions recorded in SrcSpan correspond to the positions in the StringBuffer, and are monotonic. However, the prototype implementation revealed that this is not the case. Consider the test I added recently, haddockA041, which consists of two files:

  1. haddockA041.hs:

    {-# LANGUAGE CPP #-}
    
    -- | Module header documentation
    module Comments_and_CPP_include where
    
    #include "IncludeMe.hs"
  2. IncludeMe.hs:

    -- | Comment on T
    data T = MkT -- ^ Comment on MkT

Loading it in GHCi with -ddump-parsed-ast shows a very interesting result (note the source locations):

({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:1:1
    }
 (HsModule
  (Just
   ({ testsuite/tests/haddock/should_compile_flag_haddock/haddockA041.hs:4:8-31
       }
    {ModuleName: Comments_and_CPP_include}))
  (Nothing)
  []
  [({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:1-12
       }
    (TyClD
     (NoExtField)
     (DataDecl
      (NoExtField)
      ({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:6
          }
       (Unqual
        {OccName: T}))
      (HsQTvs
       (NoExtField)
       [])
      (Prefix)
      (HsDataDefn
       (NoExtField)
       (DataType)
       ({ <no location info> }
        [])
       (Nothing)
       (Nothing)
       [({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
            }
         (ConDeclH98
          (NoExtField)
          ({ testsuite/tests/haddock/should_compile_flag_haddock/IncludeMe.hs:2:10-12
              }
           (Unqual
            {OccName: MkT}))
          ({ <no location info> }
           (False))
          []
          (Nothing)
          (PrefixCon
           [])
          (Nothing)))]
       ({ <no location info> }
        [])))))]
  (Nothing)
  (Nothing)))

We have locations such as haddockA041.hs:4:8-31, but also IncludeMe.hs:2:1-12. That is, not only those locations not monotonic, they even refer to different files!

Of course, this is great for the user, who would rather see the locations before CPP's #include did its job, but it wreaks havoc on the algorithm implemented in !2377 (closed)

The culprit is, apparently, the support for #line and {-# LINE ... #-} pragmas, which allow the user (or CPP) to arbitrarily override the recorded SrcLoc. The relevant function is setLineAndFile in Lexer.x, which performs setSrcLoc, which throws off the assumptions needed to make #17544 (closed) work.

My plan to fix:

  • Record the StringBuffer offset in RealSrcLoc, and use it instead of source locations in !2377 (closed)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information