I wanted to share Type design issue I hit recently with Striot.

Within StrIoT you define a stream-processing program, which is a series of inter-connected operators, in terms of a trio of graph types:

  • The outer-most type is a higher-order type provided by the Graph library we use: Graph a. This layer deals with all the topology concerns: what is connected to what.

  • The next type we define in StrIoT: StreamVertex, which is used to replace a in the above and make the concrete type Graph StreamVertex. Here we define all the properties of the operators. For example: the parameters supplied to the operator, and a unique vertexID integer that is unfortunately necessary. We also define which operator type each node represents, with an instance of the third type,

  • StreamOperator, a simple enumeration-style type: StreamOperator = Map | Filter | Scan…

For some recent work I needed to define some additional properties for the operators: properties that would be used in a M/M/1 model (Jackson network) to represent the program do some cost modelling with. Initially we supplied this additional information in completely separate instances of types: e.g. lists of tuples, the first of a pair representing a vertexID, etc. This was mostly fine for totally novel code, but where I had existing code paths that operated in terms of Graph StreamVertex and now needed access to these parameters, it would have meant refactoring a lot of code. So instead, I added these properties directly to the types above.

Some properties are appropriate for all node types, e.g. mean average service time. In that case, I added the parameter to the StreamVertex type:

data StreamVertex = StreamVertex
    { vertexId   :: Int
    , serviceTime:: Double

Other parameters were only applicable to certain node types. Mean average arrival rate, for example., is only valid for Source node types; selectivity is appropriate only for filter types. So, I added these to the StreamOperator type:

data StreamOperator = Map
                    | Filter Double -- selectivity
                    | Source Double -- arrival rate

This works pretty well, and most of the code paths that already exist did not need to be updated in order for the model parameters to pass through to where they are needed. But it was not a perfect solution, because I now had to modify some other, unrelated code to account for the type changes.

Mostly this was test code: where I'd defined instances of Graph StreamVertex to test something unrelated to the modelling work, I now had to add filter selectivities and source arrival rates. This was tedious but mostly solved with automatically with some editor macros.

One area though, that was a problem, was equality checks and pattern matching. Before this change, I had a few areas of code like this

if Source == operator (head (vertexList sg))
if a /= b then… -- where a and b are instances of StreamOperator

I had to replace them with little helper routines like

cmpOps :: StreamOperator -> StreamOperator -> Bool
cmpOps (Filter _) (Filter _) = True
cmpOps (FilterAcc _) (FilterAcc _) = True
cmpOps x y = x == y

A similar problem was where I needed to synthesize a Filter, and I didn't care about the selectivity, indeed, it was meaningless for the way I was using the type. I have a higher-level function that handles "hoisting" an Operator through a Merge: So, before, you have some operator occurring after a merge operation, and afterwards, you have several instances of the operator on all of the input streams prior to the Merge. Invoking it now looks like this

filterMerge = pushOp (Filter 0)

It works, the "0" is completely ignored, but the fact I have to provide it, and it's unneeded, and there is no sensible value for it, is a bit annoying.

I think there's some interesting things to consider here about Type design, especially when you have some aspects of a "thing" which are relevant only in some contexts and not others.