It’s been a while since the last journal update – mostly because some issues kept me occupied for a long time. Instead of trying to gather all those fragments, I want to spend the time writing a more structured overview of what happened – a tutorial.
For those that have not followed the journal:
This project is about processing real-time traffic data gathered by cyclists via smartphone sensors (GPS, accelerometer, gyroscope, magnetometer, .. ) with several different stream processing engines. The goal is to compare the different approaches and to find the best solution for this use-case.
For more information, start here.
Solution Strategy and Tutorial Target
Starting this project, we had a few static variables we keep for comparability reasons:
For each project, we use the same dataset. This dataset will be put into an Apache Kafka queue by the same producer. Within our stream processing engine, we shall use the same algorithm to calculate interpolation and to calibrate the data.
Aside from those, we’re free to use any means necessary to process and persist the data.
This tutorial is supposed to give you an example on how to approach your own use-case. I will give some general explanations on why and how I’m using the different components and will provide some examples where I deem them necessary.
Basic understanding of your OS, Docker, Kafka, and Ignite is recommended, but most of it can easily be researched while playing around with it.
Since the dataset is provided by Cyface, I can’t give you the entire project to play around with – but I’m sure you’ll have your own running in no time.
This tutorial series will be split into several parts:
- Setting up the (docker-)environment: Docker, Kafka, Zookeeper, Ignite
- Streaming data into Ignite Cache with Kafka Connect
- Process incoming data in Apache Ignite with Continuous Queries
- Example algorithms used in this project: Calibration, Interpolation
- Expiration policies and persistence data with Apache Ignite + Monitoring with Prometheus/Grafana
- Integrating Apache Flume and Project Conclusion
Each of those parts will not only contain general instructions on how to do stuff (It won’t be too much ‘click here, type that’, focus is on how to approach the issue), but also a small collection of issues I’ve encountered while working on those tasks – plus some fixes and explanations where I can provide them.
~ Sven Goly