jmtd → log → embedding Haskell in AsciiDoc
I'm a fan of the concept of Literate Programming (I explored it a little in my Undergraduate Dissertation a long time ago) which can be briefly (if inadequately) summarised as follows: the normal convention for computer code is by default the text within a source file is considered to be code; comments (or, human-oriented documentation) are exceptional and must be demarked in some way (such as via a special symbol). Literate Programming (amongst other things) inverts this. By default, the text in a source file is treated as comments and ignored by the compiler, code must be specially delimited.
Haskell has built-in support for this scheme: by naming your source code files
.lhs
, you can make use of one of two conventions for demarking source code:
either prefix each source code line with a chevron (called Bird-style, after
Richard Bird), or wrap code sections in a pair of delimiters \begin{code}
and
\end{code}
(TeX-style, because it facilitates embedding Haskell into a
TeX-formatted document).
For various convoluted reasons I wanted to embed Haskell into an AsciiDoc-formatted document and I couldn't use Bird-style literate Haskell, which would be my preference. The AsciiDoc delimiter for a section of code is a line of dash symbols, which can be interleaved with the TeX-style delimiters:
------------ \begin{code} next a = if a == maxBound then minBound else succ a \end{code} ------------
Unfortunately the Tex-style delimiters show up in the output once the AsciiDoc is processed. Luckily, we can swap the order of the AsciiDoc and Literate-Haskell delimiters, because the AsciiDoc ones are treated as a source-code comment by Haskell and ignored. This moves the visible TeX-style delimiters out of the code block, which is a minor improvement:
\begin{code} ------------ next a = if a == maxBound then minBound else succ a ------------ \end{code}
We can disguise the delimiters outside of the code block further by defining an
empty AsciiDoc macro called "code". Macros are marked up with surrounding braces,
leaving just stray \begin
and \end
tokens in the text. Towards the top of the
AsciiDoc file, in the pre-amble:
= Document title Document author :code:
This could probably be further improved by some AsciiDoc markup to change the
style of the text outside of the code block immediately prior to the \begin
token (perhaps make the font 0pt or the text colour the same as the background
colour) but this is legible enough for me, for now.
The resulting file can be fed to an AsciiDoc processor (like asciidoctor
, or
intepreted by GitHub's built-in AsciiDoc formatter) and to a Haskell compiler.
Unfortunately GitHub insists on a .adoc
extension to interpret the file as
AsciiDoc; GHC insists on a .lhs
extension to interpret it as Literate
Haskell (who said extensions were semantically meaningless these days…). So
I commit the file as .adoc
for GitHub's benefit and maintain a local symlink
with a .lhs
extension for my own.
Finally, I am not interested in including some of the Haskell code in my document that I need to include in the file in order for it to work as Haskell source. This can be achieved by changing from the code delimiter to AsciiDoc comment delimeters on the outside:
//////////// \begin{code} utilityFunction = "necessary but not interesting for the document" \end{code} ////////////
You can see an example of a combined AsciiDoc-Haskell file here (which is otherwise a work in progress):
https://github.com/jmtd/striot/blob/0f40d110f366ccfe8c4f07b76338ce215984113b/writeup.adoc
Comments
It looks like pandoc supports asciidoc, in which case you might be interested in a couple of pandoc filters I've written called panpipe and panhandle.
They enable "active code", which is related to literate code: code blocks are executed as part of the document rendering process, and can affect the contents of the result.
In particular, panpipe allows any code block to be sent through the stdio of any shell command (e.g. a block of Haskell code can be sent through
runhaskell
, or a bash script throughbash
, or Python code throughpython
, etc. all in the same document); the results replace the content of that code block. Panhandle allows markup written in a code block to be spliced into the document.I think of them a bit like
fmap
andjoin
for a monad: panpipe allows shell commands to process code that's embedded in markup (similar to howfmap
lets functions process arguments that are embedded in a functor); panhandle allows markup embedded in markup to be "unwrapped" (similar to howjoin
unwrapsm (m a)
intom a
).These are described in more detail on my blog