[Tutorial] Announcing Apache Kafka + Apache Ignite stream processing tutorial

 

It’s been a while since the last journal update – mostly because some issues kept me occupied for a long time. Instead of trying to gather all those fragments, I want to spend the time writing a more structured overview of what happened – a tutorial.

 


About


 

For those that have not followed the journal:

This project is about processing real-time traffic data gathered by cyclists via smartphone sensors (GPS, accelerometer, gyroscope, magnetometer, .. ) with several different stream processing engines. The goal is to compare the different approaches and to find the best solution for this use-case.

For more information, start here.

 


Solution Strategy and Tutorial Target


 

Starting this project, we had a few static variables we keep for comparability reasons:

For each project, we use the same dataset. This dataset will be put into an Apache Kafka queue by the same producer. Within our stream processing engine, we shall use the same algorithm to calculate interpolation and to calibrate the data.

Aside from those, we’re free to use any means necessary to process and persist the data.

This tutorial is supposed to give you an example on how to approach your own use-case. I will give some general explanations on why and how I’m using the different components and will provide some examples where I deem them necessary.

Basic understanding of your OS, Docker, Kafka, and Ignite is recommended, but most of it can easily be researched while playing around with it.

Since the dataset is provided by Cyface, I can’t give you the entire project to play around with – but I’m sure you’ll have your own running in no time.

 


Structure


 

This tutorial series will be split into several parts:

  1. Setting up the (docker-)environment: Docker, Kafka, Zookeeper, Ignite
  2. Streaming data into Ignite Cache with Kafka Connect
  3. Process incoming data in Apache Ignite with Continuous Queries
  4. Example algorithms used in this project: Calibration, Interpolation
  5. Expiration policies and persistence data with Apache Ignite + Monitoring with Prometheus/Grafana
  6. Integrating Apache Flume and Project Conclusion

Each of those parts will not only contain general instructions on how to do stuff (It won’t be too much ‘click here, type that’, focus is on how to approach the issue), but also a small collection of issues I’ve encountered while working on those tasks – plus some fixes and explanations where I can provide them.

Stay tuned!

~ Sven Goly

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s