I've begun writing up my phd and, not for the first time, I'm pondering issues of how best to represent things. Specifically, rewrite rules.

Here's one way of representing an example rewrite rule:

streamFilter g . streamFilter f = streamFilter (g . f)

This is a fairly succinct representation. It's sort-of Haskell, but not quite. It's an equation. The left-hand side is a pattern: it's intended to describe not one expression but a family of expressions that match. The lower case individual letters g and f are free variables: labelled placeholders within the pattern that can be referred to on the right hand side. I haven't stipulated what defines a free variable and what is a more concrete part of the pattern. It's kind-of Haskell, and I've used the well-known operator . to represent the two stream operators (streamFilters) being connected together. (In practice, when we get to the system where rules are actually applied, the connecting operator is not going to be . at all, so this is also an approximation).

One thing I don't like about . here, despite its commonness, is having to read right-to-left. I adopted the habit of using the lesser-known >>> in a lot of my work (which is defined as (>>>) = flip (.)), which reads left-to-right. And then I have the reverse problem: people aren't familiar with >>>, and, just like ., it's a stand-in anyway.

Towards the beginning of my PhD, I spent some time inventing rewrite rules to operate on pairs of operators taken from a defined, known set. I began representing the rules much as in the example above. Later on, I wanted to encode them as real Haskell, in order to check them more thoroughly. The above rule, I first encoded like this

filterFilterPre     = streamFilter g . streamFilter f
filterFilterPost    = streamFilter (g . f)
prop_filterFilter s = filterFilterPre s == filterFilterPost s

This is real code: the operators were already implemented in StrIoT, and the final expression defined a property for QuickCheck. However, it's still not quite a rewrite rule. The left-hand side, which should be a pattern, is really a concrete expression. The names f and g are masquerading as free variables but are really concretely defined in a preamble I wrote to support running QuickCheck against these things: usually simple stuff like g = odd, etc.

Eventually, I had to figure out how to really implement rewrite rules in StrIoT. There were a few approaches I could take. One would be to express the rules in some abstract form like the first example (but with properly defined semantics) and write a parser for them: I really wanted to avoid doing that.

As a happy accident, the solution I landed on was enabled by the semantics of algebraic-graphs, a Graph library we adopted to support representing a stream-processing topology. I wrote more about that in data-types for representing stream-processing programs.

I was able to implement rewrite rules as ordinary Haskell functions. The left-hand side of the rewrite rule maps to the left-hand side (pattern) part of a function definition. The function body implements the right-hand side. The system that applies the rules attempts to apply each rewrite rule to every sub-graph of a stream-processing program. The rewrite functions therefore need to signal whether or not they're applicable at runtime. For that reason, the return type is wrapped in Maybe, and we provide a catch-all pattern for every rule which simply returns Nothing. The right-hand side implementation can be pretty thorny. On the left-hand side, the stream operator connector we've finally ended up with is Connect from algebraic-graphs.

Here's filter fuse, taken from the full ruleset:

filterFuse :: RewriteRule
filterFuse (Connect (Vertex a@(StreamVertex i (Filter sel1) (p:_) ty _ s1))
                    (Vertex b@(StreamVertex _ (Filter sel2) (q:_) _ _ s2))) =
    let c = a { operator    = Filter (sel1 * sel2)
              , parameters  = [\[| (\p q x -> p x && q x) $(p) $(q) |]]
              , serviceTime = sumTimes s1 sel1 s2
              }
    in Just (removeEdge c c . mergeVertices (`elem` [a,b]) c)

filterFuse _ = Nothing

That's perhaps the simplest rule in the set. (See e.g. hoistOp for one of the worst!)

The question that remains to me, is, which representation, or representations, to use in the thesis? I'm currently planning to skip the abstract example I started with and start with the concrete Haskell encoding using QuickCheck. I'm not sure if it seems weird to have two completely separate implementations of the rules, but the simpler QuickCheck-checked rules are much closer to the "core essence" of the rules than the final implementation in StrIoT. And the derivation of the rules comes earlier in the timeline than the design work that led to the final StrIoT implementation. The middle-option is still compromised, however, by having concrete expressions pretending to be patterns. So I'm not sure.


Comments