In a Template Haskell quasiquote, what would you expect this to do?
[myparser|blah\1blah|]
That is, what String would you expect to be passed to the parser myparser? Currently you get exactly the 10-character string
blah\1blah
The backslash is passed on literally. The principle is to do as little meddling as possible, leaving it up to myparser to do what it wants. However, backslash does initiate some limited escape behaviour
The motivation here is that you might want the sequence "|]" in your quoted string, and to do so you have to escape it.
However:
Currently "\]" is an escape sequence, but it does not need to be.
Currently "\|" is an escape sequence, whereas it could be "\|]". Then [myparser|blah\|blah|] would yield the (10-char) string blah\|blah.
There is no way to have a quote that ends in a backslash, thus [myparser|\blahblah\|]. This could be fixed by adding "\\" as an escape sequence for backslash.
An alterntive would be to support no escapes at all, but instead ask myparser itself to determine where the quotation ended. Instead of String -> Q Exp the parser would have type String -> Q (String,Exp). This seems to me to be a bridge too far. Better to get GHC to determine where the quote ends.
Another alternative: support no escapes, without asking
myparser to determine where the quotation ends. Just use the
current closing oxford bracket, without a hard-wired way to
escape it.
If someone needs to be able to allow literal oxford brackets
in quasi-quotes, they can define their own syntax for that.
There is no need for GHC to provide it.
Of course there is then a limitation on what syntax can
be used for this purpose - it cannot contain
an entire closing oxford bracket, because that will be intercepted
by GHC to end the quasi-quote. So, for example, it can be
"|\]", or "'|']", but not "\|]".
I think that limitation is well worth it. GHC must do its utmost
to avoid interfering with the syntax that the quasi-quoter
is trying to define. Frankly, I was shocked to learn of this
backslash meddling.
Whereas the need to allow literal oxford brackets in a quasi-quote
seems to be somewhat of a corner case. None of the popular
quasi-quoters I know about have any particular need for that.
Though I can imagine it coming up sometimes.
Perhaps, as a convenience, the library should provide a combinator
for backslash quoting. Users can optionally compose it with
their quasi-quoter. For example, something like this:
backslashQuoter :: String -> StringbackslashQuoter s@('|':'\\':s') = case span (== '\\') s' of (bs, cs@(']':_)) -> '|' : bs ++ cs _ -> sbackslashQuoter s = s
I.e., recognize backslash quoting only for a sequence
of backslashes directly sandwiched between
'|' and ']'. Otherwise stay out of the way.
Or we could define the combinator to be more like what
Simon suggested.
If we must have hard-wired backslash quoting, I would much
prefer something more polite, like the above, over what Simon
suggested.
To summarize, I am making two alternative proposals:
No hard-wired quoting; provide a combinator for convenience. This is what I prefer. Or,
Hard-wired quoting that only affects oxford-bracket-like syntax.
As Yitz points out, third alterntive would be to let quasi-quotes end with |]. A downside is that then different quasi-quotation authors will pick different encodings. I like the idea of having a convenience function like Yitz proposed, because it'll encourage using a standard encoding.
Btw I totally agree that letting the quasi-quoters determine where the quote ends goes too far. It's not just GHC, it affects other development tools as well, starting with syntax highlighting. Those tools would then have to learn about the intricacies of every quasi-quoter in existence to do a decent job, which is clearly impossible. (And finally, don't forget the poor human reading the code.)
Another alternative: support no escapes, without asking
myparser to determine where the quotation ends. Just use the
current closing oxford bracket, without a hard-wired way to
escape it.
I don't think this is feasible. Suppose we want to support random language X inside quasiquotes, and X includes string literals. Then we might want to say
[xlang| f x = "abc|]def" |]
With my proposal you'd say
[xlang| f x = "abc\|]def" |]
With yours you can't say this at all. You may say that you simply have to adjust the xlang parser so that X never requires you to have the character sequence "|]" in the code. So you'd have to say something like
[xlang| f x = "abc\Cdef" |]
(Notice that we have to avoid the character sequence "|]") which seems a little strange. On the other hand, if "|]" isn't relevant in language X, my proposal has no effect. (Oh, except for backslash itself... that's true... Maybe "\\" should not be an escape sequence... instead "\\|]" should produce a single trailing backslash and close the quote. That'd do it.
Just wanted to mention that I actually ran into this issue in the past. The Yesod scaffolder produces a number of Haskell files, some of which include quasiquotes. And in an older version, I specified the code within a quasiquote itself, so I needed a way to allow |] to show up within the content of the QQ.
The work-around I implemented was to hardcode a value (I think ~~, I'm not certain at this point) that would be completely ignored by the quasiquoter. This worked out very well and was simple to implement. Since then, I've moved from QQ to an external file (via qRunIO and readFile) for the same effect and have been able to avoid this hack.
I'm probably one of the biggest (ab)users of QQ, and this hasn't been an issue that's bitten me too much. If it stays "broken" as is, I won't shed any tears. But I would prefer not implementing some kind of arbitrary escape mechanism within QQ. We already use a bit of backslash escaping within Hamlet, and if the QQ mechanism itself *also* did backslash escaping, code would get ugly (not to mention breakage between GHC versions).
I posted this suggestion on the GHC users list, should probably just have posted it here instead:
We introduce an alternative start tag i.e. a quote can start with [q| or [q\| (or something similar). If [q| is used then the quoted text is used verbatim and if the alternative start-tag is used then the escape sequences suggested (\\ and \|]) are enabled.
Examples:
These are all as in the current implementation:[[BR]]
[q|nothing special|][[BR]]
[q|a \ in the text|] (uses "a \\ in the text")[[BR]]
These is an incompatible change:[[BR]]
[q|a \|] |] will be a parse error, is currently "a |] "
These are all parse errors (I believe) that will work:[[BR]]
[q|a \|] will use "a \\"[[BR]]
[q\|nothing special|] will use "nothing special"[[BR]]
[q\|a \\ in the text|] will use "a \\ in the text"[[BR]]
[q\|both \|] and \\|] will use "both |] and \\"[[BR]]
Also [q\|a \ in the text|] should be a parse error, i.e. all backslashes must be escape sequences when using the alternative tag.
Arguably the incompatible change could be removed, but then the alternative tag must be used to end a quote with backslash and it seems cleaner to have one fully verbatim and one fully escapable tag.
One advantage of this approach is that we can add more escape sequences in the future without breaking code (since all backslash sequences are reserved in the alternative tag).
I don't understand why you feel "|\]"
is so much worse than "\|]".
You may say that you simply have to adjust
the xlang parser so that X never requires
you to have the character sequence "|]" in the code.
From the perspective of a user, it is obviously the
case that the usual syntax for xlang will need to be
modified for use within the quasi-quote to avoid "|]".
The question is whether we force this change upon the
user in a particular way, or whether we leave it up to
the implementer of the quasi-quoter.
(Notice that we have to avoid the character sequence "|]")
which seems a little strange.
True. But your proposal at least requires quoting that sequence.
There is no avoiding doing something to it.
(Oh, except for backslash itself... that's
true... Maybe "\\" should not be an escape
sequence... instead "\\|]" should produce a
single trailing backslash and close the
quote. That'd do it.
Not quite. What if you want a literal closing
oxford preceded by one or more backslashes? Shall
the interpretation of the bracket as literal or
closing depend on whether the number of preceding
backslashes is odd or even?
Michael Snoyman once used "|~]" when he
needed literal brackets in a quasi-quoter. It
seems simpler to work out all the cases when the
quoting characters are placed inside the sandwich,
rather than as a topping.
In any case, I still don't understand why we can't
leave as much as possible up to the author of the
quasi-quoter.
I just know that I really don't like the idea of "\\" being an escape sequence. Its already too common an escape sequence elsewhere in the PL world. So if I'm writing, say, inline javascript code, that includes string literals with escaped backslashes, then I have to double-escape those escaped backslashes, which can get ugly quickly. Also, \\ is the list difference operator in Haskell, so if I'm writing an inline haskell-like language, then I've lost a fairly standard bit of syntax.
Moving to just "\|]" to escape the end of a quote sounds like a good idea. I don't think its a great loss to not have the ability to end a quasiquote with a backslash. Any quasiquoter I could imagine would/should be able to handle trailing whitespace, in which case one could always end the quote with "\ |]".
I agree with Yitz. Adding an escaping mechanism adds no expressivity, just complexity and surprise. I spent a little time implementing the |\] escape mechanism, and it's only while testing it that I noticed that \|] existed. I really think the former is more natural and doesn't require any "magic" to support. I don't think lack of uniformity is a sensible argument, since nothing else about quasiquoters is uniform in any case, and if we added library support as suggested then that would at least encourage a common pattern.
I see your point about double escaping. Getting rid of the escape mechanism altogether is fine with me -- it's easy to implement!
But can you outline how Kathleen's PadsHaskell will work? She has quasiquotes that involve fragments of Haskell:
[pads| blah blah { f x = "wubble" } blah blah |]
where the "blah blah" is in the PADS DSL, but then there's a bit of proper Haskell inside. Now that proper Haskell is parsed with, well, a Haskell parser. So if she want the sequence "|]" in that literal string, what should she do?
where the "blah blah" is in the PADS DSL, but then there's a bit of proper Haskell inside. Now that proper Haskell is parsed with, well, a Haskell parser. So if she want the sequence "|]" in that literal string, what should she do?
I think what benmachine means by library support is a function like
escaped :: QuasiQuoter -> QuasiQuoter
That takes any quoter and produces a new one which preprocesses the input string, replacing [\| with [|. Ideally we could then write [escaped q|blah |\] blah|], but this requires a relaxed syntax for quoter identifiers. Currently we must use q' = escaped q and [q'|...]. And i believe q' must be defined in a separate module.
I think this solution is great, text is used verbatim unless the user decides to go with a modified quoter (and this should only be done if the verbatim quoter gives a parse error). This way the user only needs to worry about escaping in the very rare occasions when a |] is in the quoted text. Of course if you need both |\] and |] in your text you can use a preprocessor with a more advanced escaping mechanism, with the disadvantage that you need to be more careful with the rest of your text but only in this particular quote!
But can you outline how Kathleen's !PadsHaskell will work? She has quasiquotes that involve fragments of Haskell:
[pads| blah blah { f x = "wubble" } blah blah |]
With Yitz' proposal, she could anticipate this problem and use the predefined convenience unescape function in the definition of the quasi-quoter, or define a custom one if |\], |\\]... clash with the Pads syntax for some reason.
I really like the simplicity of this approach, and it preserves a reasonable amount of flexibility (in particular the choice of escape character can be made by the quasi-quoter author).
Simon asked me to comment on this thread. I'm afraid I don't fully understand the proposal. Could you explain it in a bit more detail, perhaps working out a small example that shows a bit of Haskell containing a |] contained within the pads/haskell quasi-quoter?
Practically our quoted Haskell is very simple and so is unlikely to contain an '|]' symbol. However, it would be nice to get the corner cases right. I don't want to have to define multiple quasi quoters in multiple files to get this corner case right, which is what it sounds like I might have to do under this proposal. I certainly don't want my end user to see that kind of complexity.
Could you explain it in a bit more detail, perhaps
working out a small example that shows a bit of
Haskell containing a |] contained within the pads/haskell
quasi-quoter?
Hmm, I wasn't immediately able to find much information
about the pads quasiquoter. But I'll do my best to work
out a simple hypothetical example below.
I don't want to have to define multiple quasi quoters
in multiple files to get this corner case right
You certainly would not need to do that.
Let's say that you currently define the pads
quasiquoter for top-level declarations. So your
definition of the quasiquoter looks something like
this:
The PADS expression parser will see "|]" where the
user wrote "|\]".
If the users want a literal "|\]" to be part of the
PADS expression, they can write "|\\]". And so on.
The character '\' only has this special pre-processing when
it is directly sandwiched between '|' and ']'.
In any other context, the '\' is passed literally
to the base PADS quasiquoter as before.
Perhaps this syntax for quoting the closing oxford bracket
does not suit you for PADS, e.g., due to interaction with
a special meaning of '\' in PADS itself.
Then instead of the escapeBracket library function used
above, you can define your own syntax for a quoted closing
oxford bracket:
For example, one quasiquoter author mentioned above that
he used "|~]" to represent "|]".
See my earlier comment in this ticket for an example of
how to write a bracket escape function.
With this proposal, you have complete flexibility to decide
what syntax users use to embed a literal "|]".
You can easily use the built-in syntax as above, or you can define
your own syntax without too much more effort, or you can decide
not to allow them at all.