Kafka, where are all my messages 😱?

Recently I got the chance to help a customer moving from a Helm based deployment of Apache Kafka in Kubernetes to the Strimzi, which manages the deployment of Kafka in Kubernetes with an operator. In the final phase, I had to transfer the data to the newly deployed Apache Kafka. Setting the scope The customer is using compacted topics (cleanup.policy=compact and log.rentention.ms=-1) as the default configuration for all topics in Apache Kafka, so they’re using it somehow like a database where messages are produced with an unique ID and are nulled afterwards when it is required to delete a message. ...

January 26, 2022 · 5 min · Akhlaq Malik

ETL with Kafka

Originally published at codecentrics blog “ETL with Kafka” is a catchy phrase that I purposely chose for this post instead of a more precise title like “Building a data pipeline with Kafka Connect”. TLDR You don’t need to write any code for pushing data into Kafka, instead just choose your connector and start the job with your necessary configurations. And it’s absolutely Open Source! Kafka Connect Kafka Before getting into the Kafka Connect framework, let us briefly sum up what Apache Kafka is in couple of lines. Apache Kafka was built at LinkedIn to meet the requirements that message brokers already existing in the market did not meet – requirements such as scalable, distributed, resilient with low latency and high throughput. Currently, i.e. 2018, LinkedIn is processing about 1.8 petabytes of data per day through Kafka. Kafka offers a programmable interface (API) for a lot of languages to produce and consume data. ...

February 2, 2018 · 5 min · Akhlaq Malik