I've been exploring typesetting and formatting code within text documents such as papers, or my thesis. Up until now, I've been using the listings package without thinking much about it. By default, some sample Haskell code processed by listings looks like this (click any of the images to see larger, non-blurry versions):

default output of listings on a Haskell code sample

It's formatted with a monospaced font, with some keywords highlighted, but not syntactic symbols.

There are several other options for typesetting and formatting code in LaTeX documents. For Haskell in particular, there is the preprocessor lhs2tex, The default output of which looks like this:

default output of lhs2tex on a Haskell code sample

A proportional font, but it's taken pains to preserve vertical alignment, which is syntactically significant for Haskell. It looks a little cluttered to me, and I'm not a fan of nearly everything being italic. Again, symbols aren't differentiated, but it has substituted them for more typographically pleasing alternatives: -> has become , and \ is now λ.

Another option is perhaps the newest, the LaTeX package minted, which leverages the Python Pygments program. Here's the same code again. It defaults to monospace (the choice of font seems a lot clearer to me than the default for listings), no symbolic substitution, and liberal use of colour:

default output of minted on a Haskell code sample

An informal survey of the samples so far showed that the minted output was the most popular.

All of these packages can be configured to varying degrees. Here are some examples of what I've achieved with a bit of tweaking

_listings_ adjusted with colour and some symbols substituted (but sadly not the two together)

listings adjusted with colour and some symbols substituted (but sadly not the two together)

_lhs2tex_ adjusted to be less italic, sans-serif and use some colour

lhs2tex adjusted to be less italic, sans-serif and use some colour

All of this has got me wondering whether there are straightforward empirical answers to some of these questions of style.

Firstly, I'm pretty convinced that symbolic substitution is valuable. When writing Haskell, we write ->, \, /= etc. not because it's most legible, but because it's most practical to type those symbols on the most widely available keyboards and popular keyboard layouts.1 Of the three options listed here, symbolic substitution is possible with listings and lhs2tex, but I haven't figured out if minted can do it (which is really the question: can pygments do it?)

I'm unsure about proportional versus monospaced fonts. We typically use monospaced fonts for editing computer code, but that's at least partly for historical reasons. Vertical alignment is often very important in source code, and it can be easily achieved with monospaced text; it's also sometimes important to have individual characters (., etc.) not be de-emphasised by being smaller than any other character.

lhs2tex, at least, addresses vertical alignment whilst using proportional fonts. I guess the importance of identifying individual significant characters is just as true in a code sample within a larger document as it is within plain source code.

From a (brief) scan of research on this topic, it seems that proportional fonts result in marginally quicker reading times for regular prose. It's not clear whether those results carry over into reading computer code in particular, and the margin is slim in any case. The drawbacks of monospaced text mostly apply when the volume of text is large, which is not the case for the short code snippets I am working with.

I still have a few open questions:

  • Is colour useful for formatting code in a PDF document?
    • does this open up a can of accessibility worms?
  • What should be emphasised (or de-emphasised)
  • Why is the minted output most popular: Could the choice of font be key? Aspects of the font other than proportionality (serifs? Size of serifs? etc)

  1. The Haskell package Data.List.Unicode lets the programmer use a range of unicode symbols in place of ASCII approximations, such as instead of elem, instead of /=. Sadly, it's not possible to replace the denotation for an anonymous function, \, with λ this way.

Comments