Date of Award
Doctor of Philosophy (PhD)
In the past decade, the world has seen the rise of big data, which calls for a paradigm shift in data processing. Streaming processing, where data are processed in their spatial or temporal order, is increasingly common. Meanwhile, parallel computing has become a household term in the computing world. The combination of streaming processing and parallel computing, streaming computing, has been playing an important role in data processing.
A streaming computing system is a network of nodes connected by unidirectional first-in first-out (FIFO) data channels. When a node has multiple input channels, to ensure the deterministic behavior of the whole system, synchronization is required on those channels when the node consumes data. After a streaming computing node finishes a computation, it may choose not to produce output on some of its output channels. This behavior, known as filtering, is data-dependent and unpredictable. When filtered data streams are synchronized, applications can deadlock due to empty and full channel buffers.
To avoid deadlocks and ensure bounded-memory execution, we turn to model-based approaches. In this dissertation, we propose the synchronized filtering dataflow (SFDF) to model synchronization and filtering behaviors. We avoid deadlocks in SFDF applications by augmenting data streams with dummy messages. We design decentralized algorithms that compute a dummy interval for each channel during compilation time and schedule dummy messages according to the dummy intervals during runtime.
The runtime parts of our algorithms are very efficient, adding little overhead to computing nodes, but computing dummy intervals could be very time-consuming on general dataflow graphs. We design efficient algorithms to compute dummy intervals for streaming applications with special topologies. In particular, we focus on series-parallel directed acyclic graphs (SP-DAGs) and CS4 DAGs, where each undirected cycle is single-source and single-sink.
We further extend our work to describe a set of polyhedral constraints that define all sets of safe dummy intervals for any dataflow graphs, which gives us more flexibility to choose dummy intervals. We also provide a polynomial-time algorithm to verify the safety of given dummy intervals for SP-DAGs.
Dummy messages are only one type of control message used by streaming applications. We extend our SFDF model to support more types of control message, which are precisely synchronized with data streams. We use two types of control messages, dummy message and credit message, to guarantee bounded-memory execution. We demonstrate that the extended model can help improve performance of some applications by adding filtering behavior to non-filtering applications.
Roger D Chamberlain, Joseph A O'Sullivan
Permanent URL: https://doi.org/10.7936/K7CF9N7J