unlit does not follow H98 spec

The Haskell 98 spec has this to say about the Latex \begin{code} \end{code} style:

An alternative style of literate programming is particularly suitable for use with the LaTeX text processing system. In this convention, only those parts of the literate program that are entirely enclosed between \begin{code}...\end{code} delimiters are treated as program text; all other lines are comment. More precisely:

Program code begins on the first line following a line that begins \begin{code}.
Program code ends just before a subsequent line that begins \end{code} (ignoring string literals, of course).

The key phrases are "a line that begins \begin{code}" and "line that begins \end{code}". This means the semantics is something like:

classifyLine s
  | "\\begin{code}" `isPrefixOf` s = BeginCode
  | "\\end{code}"   `isPrefixOf` s = EndCode

GHC's unlit C program uses:

    if (strcmp(buf, "\\begin{code}") == 0)
        return BEGIN;

The equivalent semantics in the style above would be:

classifyLine s
  | "\\begin{code}" == s = BeginCode
  | "\\end{code}"   == s = EndCode

It seems fairly clear from the spec that GHC's unlit program is wrong in this respect.

The practical consequence is that Cabal's unlit and GHC's one do not match and this catches people out. There is explicit advice on the Haskell wiki recommending that people take advantage of this GHC bug.

Trac metadata

Trac field	Value
Version	6.10.4
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information