Automating savepoints in Apache Flink

In a stateful streaming application the state of an application is one of the most important part. In Apache Flink we have the possibility to kind of backup the state with a so called savepointing mechanism. From these savepoints you as a developer or operations manager are able to stop-and-resume, fork, or update your Flink jobs (more to read about savepoints in the docs). So it’s a great possibility to try out different implementations of a fork for a Flink application....

October 18, 2023 · 3 min · Akhlaq Malik

Kafka, where are all my messages 😱?

Recently I got the chance to help a customer moving from a Helm based deployment of Apache Kafka in Kubernetes to the Strimzi, which manages the deployment of Kafka in Kubernetes with an operator. In the final phase, I had to transfer the data to the newly deployed Apache Kafka. Setting the scope The customer is using compacted topics (cleanup.policy=compact and log.rentention.ms=-1) as the default configuration for all topics in Apache Kafka, so they’re using it somehow like a database where messages are produced with an unique ID and are nulled afterwards when it is required to delete a message....

January 26, 2022 · 5 min · Akhlaq Malik

Apache Flink Continuous Deployment

Coming from Kafka-Streams continuous delivery (CD) is quite an easy task, and almost no effort has to be done compared to Apache Flink. Because the state of a Kafka-Streams application is stored in Kafka, and it can build up the state after a redeployment from so-called changelog topics, therefore Kafka-Streams is also bounded to have source and sink to Apache Kafka. Apache Flink on the other hand has the freedom to choose from a variety of source systems, e....

March 30, 2021 · 5 min · Akhlaq Malik

Tired of repeated gitlab-ci files? Includes to the rescue!

Building pipelines aka Continuous Integration and Continuous Delivery (CI/CD) are not really new buzzwords in the tech industry or as sexy as bitcoin and friends, but I’m still quite excited about the recent release of the GitLab CE 11.7 Version which was released on 22nd January 2019. In this post we will have a look to newly added feature in GitLab CI where we will MAKE THE GITLAB-CI.YML DRY AGAIN...

February 4, 2019 · 5 min · Akhlaq Malik

ETL with Kafka

Originally published at codecentrics blog “ETL with Kafka” is a catchy phrase that I purposely chose for this post instead of a more precise title like “Building a data pipeline with Kafka Connect”. TLDR You don’t need to write any code for pushing data into Kafka, instead just choose your connector and start the job with your necessary configurations. And it’s absolutely Open Source! Kafka Connect Kafka Before getting into the Kafka Connect framework, let us briefly sum up what Apache Kafka is in couple of lines....

February 2, 2018 · 5 min · Akhlaq Malik