jmtd → log → phd → My PhD topic

I'm long overdue writing about what I'm doing for my PhD, so here goes. To stop this getting too long I haven't defined a lot of concepts so it might not make sense to folks without a Computer Science background. I'm happy to answer any questions in the comments.

I'm investigating whether there are advantages to building a distributed stream processing system using pure functional programming, specifically, whether the reasoning abilites one has about purely functional systems allow us to build efficient stream processing systems.

We have a proof-of-concept of a stream processing system built using Haskell called STRIoT (Stream Processing for IoT). Via STRIoT, a user can define a graph of stream processing operations from a set of 8 purely functional operators. The chosen operators have well-understood semantics, so we can apply strong reasoning to the user-defined stream graph. STRIoT supports partitioning a stream graph into separate sub-graphs which are distributed to separate nodes, interconnected via the Internet. The examples provided with STRIoT use Docker and Docker Compose for the distribution.

The area I am currently focussing on is whether and how STRIoT could rewrite the stream processing graph, preserving it's functional behaviour, but improving its performance against one or more non-functional requirements: for example making it perform faster, or take up less memory, or a more complex requirement such as maximising battery life for a battery-operated component, or something similar.

Pure FP gives us the ability to safely rewrite chunks of programs by applying equational reasoning. For example, we can always replace the left-hand side of this equation by the right-hand side, which is functionally equivalent, but more efficient in both time and space terms:

map f . map g = map (f . g)

However, we need to reason about potentially conflicting requirements. We might sometimes increase network latency or overall processing time in order to reduce the power usage of nodes, such as smart watches or battery-operated sensors deployed in difficult-to-reach locations. This has implications on the design of the Optimizer, which I am exploring.