2010

Authors

  • Roozbeh Derakhshan Roozbeh Derakhshan

We are living in a time where a massive 2.5 quintillion bytes of data is generated every day. To realise the value of this data, stream data processing offers a new pro- cessing paradigm that aggregates and analyses large volumes of data quickly. While several commercial Stream Processing Engines (SPEs) are available, it remains dif- ficult to develop stream-based applications. Over the last decade our research has identified and addressed two dominant reasons for this difficulty: Heterogeneity and Stored-Streaming Divide. Heterogeneity highlights the lack of standards in SPEs as well as the wide and changing variety of application requirements. Stored- Streaming Divide is the focus of this thesis. Stored-Streaming Divide emerges from the fact that commercial SPEs treat streaming data and relational data as separate entities even though applications increasingly demand integrated access to both. This integration manifests itself as the join between the stream of fast coming data and relational data sources and is what we call the Stream-Relation Join (SRJ) problem. Two solutions are provided to address the SRJ problem in this thesis. Some commercial SPEs and research projects take a radical approach to ad- dressing the SRJ problem by building an SPE on top of a database from scratch, we call this the stream-relational approach. This approach is cumbersome, it re- quires extensive alteration to the database kernel to process the streaming queries. Alternatively, our approach provides a lean layer that sits between the application and a commercial SPE, which we call the federation layer. This layer extends the database only to the point that it provides just enough functionality to interact with the application and the SPE. In doing so, the federation approach not only efficiently addresses the SRJ problem but ensures the application is portable across a range of commercial SPEs. How to build a federation layer is detailed in this thesis