jmtd → log → embedding Haskell in AsciiDoc

I'm a fan of the concept of Literate Programming (I explored it a little in my Undergraduate Dissertation a long time ago) which can be briefly (if inadequately) summarised as follows: the normal convention for computer code is by default the text within a source file is considered to be code; comments (or, human-oriented documentation) are exceptional and must be demarked in some way (such as via a special symbol). Literate Programming (amongst other things) inverts this. By default, the text in a source file is treated as comments and ignored by the compiler, code must be specially delimited.

Haskell has built-in support for this scheme: by naming your source code files .lhs, you can make use of one of two conventions for demarking source code: either prefix each source code line with a chevron (called Bird-style, after Richard Bird), or wrap code sections in a pair of delimiters \begin{code} and \end{code} (TeX-style, because it facilitates embedding Haskell into a TeX-formatted document).

For various convoluted reasons I wanted to embed Haskell into an AsciiDoc-formatted document and I couldn't use Bird-style literate Haskell, which would be my preference. The AsciiDoc delimiter for a section of code is a line of dash symbols, which can be interleaved with the TeX-style delimiters:

------------
\begin{code}
next a = if a == maxBound then minBound else succ a
\end{code}
------------

Unfortunately the Tex-style delimiters show up in the output once the AsciiDoc is processed. Luckily, we can swap the order of the AsciiDoc and Literate-Haskell delimiters, because the AsciiDoc ones are treated as a source-code comment by Haskell and ignored. This moves the visible TeX-style delimiters out of the code block, which is a minor improvement:

\begin{code}
------------
next a = if a == maxBound then minBound else succ a
------------
\end{code}

We can disguise the delimiters outside of the code block further by defining an empty AsciiDoc macro called "code". Macros are marked up with surrounding braces, leaving just stray \begin and \end tokens in the text. Towards the top of the AsciiDoc file, in the pre-amble:

= Document title
Document author
:code:

This could probably be further improved by some AsciiDoc markup to change the style of the text outside of the code block immediately prior to the \begin token (perhaps make the font 0pt or the text colour the same as the background colour) but this is legible enough for me, for now.

The resulting file can be fed to an AsciiDoc processor (like asciidoctor, or intepreted by GitHub's built-in AsciiDoc formatter) and to a Haskell compiler. Unfortunately GitHub insists on a .adoc extension to interpret the file as AsciiDoc; GHC insists on a .lhs extension to interpret it as Literate Haskell (who said extensions were semantically meaningless these days…). So I commit the file as .adoc for GitHub's benefit and maintain a local symlink with a .lhs extension for my own.

Finally, I am not interested in including some of the Haskell code in my document that I need to include in the file in order for it to work as Haskell source. This can be achieved by changing from the code delimiter to AsciiDoc comment delimeters on the outside:

////////////
\begin{code}
utilityFunction = "necessary but not interesting for the document"
\end{code}
////////////

You can see an example of a combined AsciiDoc-Haskell file here (which is otherwise a work in progress):

https://github.com/jmtd/striot/blob/0f40d110f366ccfe8c4f07b76338ce215984113b/writeup.adoc

Comments

Atom

comment 1

It looks like pandoc supports asciidoc, in which case you might be interested in a couple of pandoc filters I've written called panpipe and panhandle.

They enable "active code", which is related to literate code: code blocks are executed as part of the document rendering process, and can affect the contents of the result.

In particular, panpipe allows any code block to be sent through the stdio of any shell command (e.g. a block of Haskell code can be sent through runhaskell, or a bash script through bash, or Python code through python, etc. all in the same document); the results replace the content of that code block. Panhandle allows markup written in a code block to be spliced into the document.

I think of them a bit like fmap and join for a monad: panpipe allows shell commands to process code that's embedded in markup (similar to how fmap lets functions process arguments that are embedded in a functor); panhandle allows markup embedded in markup to be "unwrapped" (similar to how join unwraps m (m a) into m a).

These are described in more detail on my blog

Comment by Chris Warburton, 2019-02-19

✏ comment on this page (Help on commenting)