Many of our customers are moving their traditional ETL jobs to real-time stream processing.
The following article is an excellent read of why Kafka is an excellent choice for unified batch processing and stream processing.
https://www.infoq.com/articles/batch-etl-streams-kafka
Snippets from the article:
The following article is an excellent read of why Kafka is an excellent choice for unified batch processing and stream processing.
https://www.infoq.com/articles/batch-etl-streams-kafka
Snippets from the article:
- Several recent data trends are driving a dramatic change in the old-world batch Extract-Transform-Load (ETL) architecture: data platforms operate at company-wide scale; there are many more types of data sources; and stream data is increasingly ubiquitous
- Enterprise Application Integration (EAI) was an early take on real-time ETL, but the technologies used were often not scalable. This led to a difficult choice with data integration in the old world: real-time but not scalable, or scalable but batch.
- Apache Kafka is an open source streaming platform that was developed seven years ago within LinkedIn.
- Kafka enables the building of streaming data pipelines from “source” to “sink” through the Kafka Connect API and the Kafka Streams API.
- Logs unify batch and stream processing. A log can be consumed via batched “windows”, or in real time by examining each element as it arrives.
No comments:
Post a Comment