jmtd → log → template haskell → Template Haskell and Stream-processing programs
I've written about what Template Haskell is, and given an example of what it can be used for, it's time to explain why I was looking at it in the context of my PhD work.
Encoding stream-processing programs
StrIoT is an experimental distributed stream-processing system that myself and others are building in order to explore our research questions. A user of StrIoT writes a stream-processing program, using a set of 8 functional operators provided for the purpose. A simple example is
streamFn :: Stream Int -> Stream Int
streamFn = streamFilter (<15)
. streamFilter (>5)
. streamMap (*2)
Our system is distributed: we take a stream-processing program and partition it into sub-programs, which are distributed to and run on separate nodes (perhaps cloud instances, or embedded devices like Raspberry Pis etc.). In order to do that, we need to be able to manipulate the stream-processing program as data. We've initially opted for a graph data-structure, with the vertices in the graph defined as
data StreamVertex = StreamVertex
{ vertexId :: Int
, operator :: StreamOperator
, parameters :: [String]
, intype :: String
, outtype :: String
} deriving (Eq,Show)
A stream-processing program encoded this way, equivalent to the first example
path [ StreamVertex 0 Map ["(*2)"] "Int" "Int"
, StreamVertex 1 Filter ["(>5)"] "Int" "Int"
, StreamVertex 2 Filter ["(<15)"] "Int" "Int"
]
We can easily manipulate instances of such types, rewrite them, partition them and generate code from them. Unfortunately, this is quite a departure from the first simple code example from the perspective of a user writing their program.
Template Haskell gives us the ability to manipulate code as a data structure, and also to inspect names to gather information about them (their type, etc.). I started looking at TH to see if we could build something where the user-supplied program was as close to that first case as possible.
TH limitations
There are two reasons that we can't easily manipulate a stream-processing definition written as in the first example. The following expressions are equivalent, in some sense, but are not equal, and so yield completely different expression trees when quasi-quoted:
[| streamFilter (<15) . streamFilter (>5) . streamMap (*2) |]
| \s -> streamFilter (<15) (streamFilter (>5) (streamMap (*2) s)) |]
[| streamMap (*2) >>> streamFilter (>5) >>> streamFilter (<15) |]
[| \s -> s & streamMap (*2) & streamFilter (>5) & streamFilter (<15) |]
[| streamFn |] -- a named expression, defined outside the quasi-quotes
In theory, reify
can give you the definition of a function from its
name, but in practice it doesn't, because this was never implemented.
So at the very least we would need to insist that a user included the
entirety of a stream-processing program within quasi-quotes, and not
split it up into separate bits, with some bits defined outside the
quotes and references within (as in the last case above). We would
probably have to insist on a consistent approach for composing operators
together, such as always use (.)
and never >>>
, &
, etc. which is
limiting.
Incremental approach
After a while ruminating on this, and before moving onto something else,
I thought I'd try approaching it from the other side. Could I introduce
some TH into the existing approach, and improve it? The first thing I've
tried is to change the parameters
field to TH's ExpQ
, meaning the
map instance example above would be
StreamVertex 0 Map [ [| (*2) |] ] "Int" "Int"
I worked this through. It's an incremental improvement ease and clarity for the user writing a stream-processing program. It catches a class of programming bugs that would otherwise slip through: the expressions in the brackets have to be syntactically valid (although they aren't type checked). Some of the StrIoT internals are also much improved, particularly the logical operator. Here's an excerpt from a rewrite rule that involves composing code embedded in strings, dealing with all the escaping rules and hoping we've accounted for all possible incoming expression encodings:
let f' = "(let f = ("++f++"); p = ("++p++"); g = ("++g++") in\
\ \\ (a,b) v -> (f a v, if p v a then g b v else b))"
a' = "("++a++","++b++")"
q' = "(let p = ("++p++"); q = ("++q++") in \\v (y,z) -> p v y && q v z)"
And the same section after, manipulating ExpQ types:
let f' = [| \ (a,b) v -> ($(f) a v, if $(p) v a then $(g) b v else b) |]
a' = [| ($(a), $(b)) |]
q' = [| \v (y,z) -> $(p) v y && $(q) v z |]
I think the code-generation part of StrIoT could be radically refactored to take advantage of this change but I have not made huge inroads into that.
Next steps
This is, probably, where I am going to stop. This work is very interesting to me but not the main thrust of my research. But incrementally improving the representation gave me some ideas of what I could try next:
intype
andouttype
could be THType
s instead ofString
s. This would catch some simple problems like typos, etc., but we could possibly go further, and- remove the explicit in-and-out-types and infer their values from the parameters field, as its an expression with some type that should match
parameters
is a list, because the different stream operators have different arities.streamFilter
has one parameter (the filter predicate), so the list should have one element in that case, butstreamExpand
has none, so it should be empty. We could collapse this to a singleExpQ
, which encoded however many parameters are necessary, either in an internal list, or…- the operator field could be merged in too, so that the
parameters
expression was actually a call to the relevant operator with its parameters supplied.
The type would have collapsed down to
data StreamVertex = StreamVertex
{ vertexId :: Int
, opAndParams :: ExpQ
} deriving (Eq,Show)
Example instances might be
StreamVertex 0 [| streamMap (*2) |]
StreamVertex 1 [| streamExpand |]
StreamVertex 2 [| streamScan (\c _ -> c+1) 0 |]
The vertexId
field is a bit of wart, but we require that due to
the graph data structure that we are using. A change there could
eliminate it, too. By this point we are not that far away from where
we started, and certainly much closer to the "pure" function application
in the very first example.
Comments