jmtd → log → Type design
I wanted to share Type design issue I hit recently with Striot.
Within StrIoT you define a stream-processing program, which is a series of inter-connected operators, in terms of a trio of graph types:
The outer-most type is a higher-order type provided by the Graph library we use:
Graph a
. This layer deals with all the topology concerns: what is connected to what.The next type we define in StrIoT:
StreamVertex
, which is used to replacea
in the above and make the concrete typeGraph StreamVertex
. Here we define all the properties of the operators. For example: the parameters supplied to the operator, and a uniquevertexID
integer that is unfortunately necessary. We also define which operator type each node represents, with an instance of the third type,StreamOperator
, a simple enumeration-style type:StreamOperator = Map | Filter | Scan…
For some recent work I needed to define some additional properties for the
operators: properties that would be used in a M/M/1 model (Jackson network) to
represent the program do some cost modelling with. Initially we supplied this
additional information in completely separate instances of types: e.g. lists
of tuples, the first of a pair representing a vertexID
, etc. This was mostly
fine for totally novel code, but where I had existing code paths that operated
in terms of Graph StreamVertex
and now needed access to these parameters, it
would have meant refactoring a lot of code. So instead, I added these properties
directly to the types above.
Some properties are appropriate for all node types, e.g. mean average service time.
In that case, I added the parameter to the StreamVertex
type:
data StreamVertex = StreamVertex
{ vertexId :: Int
…
, serviceTime:: Double
}
Other parameters were only applicable to certain node types. Mean average
arrival rate, for example., is only valid for Source
node types;
selectivity is appropriate only for filter types. So, I added these to the
StreamOperator
type:
data StreamOperator = Map
| Filter Double -- selectivity
…
| Source Double -- arrival rate
…
This works pretty well, and most of the code paths that already exist did not need to be updated in order for the model parameters to pass through to where they are needed. But it was not a perfect solution, because I now had to modify some other, unrelated code to account for the type changes.
Mostly this was test code: where I'd defined instances of Graph StreamVertex
to test something unrelated to the modelling work, I now had to add filter
selectivities and source arrival rates. This was tedious but mostly solved with
automatically with some editor macros.
One area though, that was a problem, was equality checks and pattern matching. Before this change, I had a few areas of code like this
if Source == operator (head (vertexList sg))
…
if a /= b then… -- where a and b are instances of StreamOperator
I had to replace them with little helper routines like
cmpOps :: StreamOperator -> StreamOperator -> Bool
cmpOps (Filter _) (Filter _) = True
cmpOps (FilterAcc _) (FilterAcc _) = True
cmpOps x y = x == y
A similar problem was where I needed to synthesize a Filter
, and I didn't care
about the selectivity, indeed, it was meaningless for the way I was using the type.
I have a higher-level function that handles "hoisting" an Operator through a Merge:
So, before, you have some operator occurring after a merge operation, and afterwards,
you have several instances of the operator on all of the input streams prior to the
Merge. Invoking it now looks like this
filterMerge = pushOp (Filter 0)
It works, the "0" is completely ignored, but the fact I have to provide it, and it's unneeded, and there is no sensible value for it, is a bit annoying.
I think there's some interesting things to consider here about Type design, especially when you have some aspects of a "thing" which are relevant only in some contexts and not others.
Comments
http://
in front of the striot.org URL