Below are the five most recent posts in my weblog.

You can also see a reverse-chronological list of all posts, dating back to 1999.

For my PhD, my colleagues/collaborators and I built a distributed stream-processing system using Haskell. There are several other Haskell stream-processing systems. How do they compare?

First, let's briefly discuss and define streaming in this context.

Structure and Interpretation of Computer Programs introduces Streams as an analogue of lists, to support delayed evaluation. In brief, the inductive list type (a list is either an empty list or a head element pre-pended to another list) is replaced with a structure with a head element and a promise which, when evaluated, will generate the tail (which in turn may have a head element and a promise to generate another tail, culminating in the equivalent of an empty list.) Later on SICP also covers lazy evaluation.

However, the streaming we're talking about originates in the relational community, rather than the functional one, and is subtly different. It's about building a pipeline of processing that receives and emits data but doesn't need to (indeed, cannot) reference the whole stream (which may be infinite) at once.

Haskell streaming systems

Now let's go over some Haskell streaming systems.

conduit (2011-)

Conduit is the oldest of the ones I am reviewing here, but I doubt it's the first in the Haskell ecosystem. If I've made any obvious omissions, please let me know!

Conduit provides a new set of types to model streaming data, and a completely new set of functions which are analogues of standard Prelude functions, e.g. sumC in place of sum. It provides its own combinator(s) such as .| ( aka fuse) which is like composition but reads left-to-right.

The motivation for this is to enable (near?) constant memory usage for processing large streams of data -- presumably versus using a list-based approach and to provide some determinism: the README gives the example of "promptly closing file handles". I think this is another way of saying that it uses strict evaluation, or at least avoids lazy evaluation for some things.

Conduit offers interleaved effects: which is to say, IO can be performed mid-stream.

Conduit supports distributed operation via Data.Conduit.Network in the conduit-extra package. Michael Snoyman, principal Conduit author, wrote up how to use it here: https://www.yesodweb.com/blog/2014/03/network-conduit-async To write a distributed Conduit application, the application programmer must manually determine the boundaries between the clients/servers and write specific code to connect them.

pipes (2012-)

The Pipes Tutorial contrasts itself with "Conventional Haskell stream programming": whether that means Conduit or something else, I don't know.

Paraphrasing their pitch: Effects, Streaming Composability: pick two. That's the situation they describe for stream programming prior to Pipes. They argue Pipes offers all three.

Pipes offers it's own combinators (which read left-to-right) and offers interleaved effects.

At this point I can't really see what fundamentally distinguishes Pipes from Conduit.

Pipes has some support for distributed operation via the sister library pipes-network. It looks like you must send and receive ByteStrings, which means rolling your own serialisation for other types. As with Conduit, to send or receive over a network, the application programmer must divide their program up into the sub-programs for each node, and add the necessary ingress/egress code.

io-streams (2013-)

io-streams emphasises simple primitives. Reading and writing is done under the IO Monad, thus, in an effectful (but non-pure) context. The presence or absence of further stream data are signalled by using the Maybe type (Just more data or Nothing: the producer has finished.)

It provides a library of functions that shadow the standard Prelude, such as S.fromList, S.mapM, etc.

It's not clear to me what the motivation for io-streams is, beyond providing a simple interface. There's no declaration of intent that I can find about (e.g.) constant-memory operation.

There's no mention of or support (that I can find) for distributed operation.

streaming (2015-)

Similar to io-streams, Streaming emphasises providing a simple interface that gels well with traditional Haskell methods. Streaming provides effectful streams (via a Monad -- any Monad?) and a collection of functions for manipulating streams which are designed to closely mimic standard Prelude (and Data.List) functions.

Streaming doesn't push its own combinators: the examples provided use $ and read right-to-left.

The motivation for Streaming seems to be to avoid memory leaks caused by extracting pure lists from IO with traditional functions like mapM, which require all the list constructors to be evaluated, the list to be completely deconstructed, and then a new list constructed.

Like io-streams, the focus of the library is providing a low-level streaming abstraction, and there is no support for distributed operation.

streamly (2017-)

Streamly appears to have the grand goal of providing a unified programming tool as suited for quick-and-dirty programming tasks (normally the domain of scripting languages) and high-performance work (C, Java, Rust, etc.). Their intended audience appears to be everyone, or at least, not just existing Haskell programmers. See their rationale

Streamly offers an interface to permit composing concurrent (note: not distributed) programs via combinators. It relies upon fusing a streaming pipeline to remove intermediate list structure allocations and de-allocations (i.e. de-forestation, similar to GHC rewrite rules)

The examples I've seen use standard combinators (e.g. Control.Function.&, which reads left-to-right, and Applicative).

Streamly provide benchmarks versus Haskell pure lists, Streaming, Pipes and Conduit: these generally show Streamly several orders of magnitude faster.

I'm finding it hard to evaluate Streamly. It's big, and it's focus is wide. It provides shadows of Prelude functions, as many of these libraries do.

wrap-up

It seems almost like it must be a rite-of-passage to write a streaming system in Haskell. Stones and glass houses, I'm guilty of that too.

The focus of the surveyed libraries is mostly on providing a streaming abstraction, normally with an analogous interface to standard Haskell lists. They differ on various philosophical points (whether to abstract away the mechanics behind type synonyms, how much to leverage existing Haskell idioms, etc). A few of the libraries have some rudimentary support for distributed operation, but this is limited to connecting separate nodes together: in some cases serialising data remains the application programmer's job, and in all cases the application programmer must manually carve up their processing according to a fixed idea of what nodes they are deploying to. They all define a fixed-function pipeline.

Tags:

I wanted to follow new content posted to Printables.com with a feed reader, but Printables.com doesn't provide one. Neither do the other obvious 3d model catalogues. So, I started building one.

I have something that spits out an Atom feed and a couple of beta testers gave me some valuable feedback. I had planned to make it public, with the ultimate goal being to convince Printables.com to implement feeds themselves.

Meanwhile, I stumbled across someone else who has done basically the same thing. Here are 3rd party feeds for

The format of their feeds is JSON-Feed, which is new to me. FreshRSS and NetNewsWire seems happy with it. (I went with Atom.) I may still release my take, if I find time to make one improvmment that my beta-testers suggested.

Tags:

Red Hat Fedora company logo

I've just passed my 10th anniversary of starting at Red Hat! As a personal milestone, this is the longest I've stayed in a job: I managed 10 years at Newcastle University, although not in one continuous role.

I haven't exactly worked in one continuous role at Red Hat either, but it feels like what I do Today is a logical evolution from what I started doing, whereas in Newcastle I jumped around a bit.

I've seen some changes: in my time here, we changed the logo from Shadow Man; we transitioned to using Google Workspace for lots of stuff, instead of in-house IT; we got bought by IBM; we changed President and CEO, twice. And millions of smaller things.

I won't reach an 11th: my Organisation in Red Hat is moving to IBM. I think this is sad news for Red Hat: they're losing some great people. But I'm optimistic for the future of my Organisation.

Tags:

Happy 20th birthday, dsafilter!

dsafilter is a mail filter I wrote two decades ago to solve a problem I had: I was dutifully subscribed to debian-security-announce to learn of new security package updates, but most were not relevant to me.

The filter creates a new, summarizing mail, reporting on whether the DSA was applicable to any package installed on the system running the filter, and attached the original DSA mail for reference. Users can then choose to drop mails for packages that aren't relevant.

In 2005 I'd been a Debian user for about 6 years, I'd met a few Debian developers in person and I was interested in getting involved. I started my journey to Developer later that same year. I published dsafilter, and I think I sent an announcement to debian-devel, but didn't do a great deal to socialise it, so I suspect nobody else is using it.

That said, I have been for the two decades, and I still am! What's notable to me about that is that I haven't had to modify the script at all to keep up with software changes, in particular, from the interpreter. I wrote it as a Ruby script. If I had chosen Perl, it would probably be the same story, but if I'd chosen Python, there's no chance at all that it would still be working today.

If it sounds interesting to you, please give it a try. I think it might be due some spring cleaning.

Tags:

Older posts are available on the all posts page.