Apache Kafka and BigData
The solution implements the publish-subscribe communication model with particular emphasis put on maximum throughput and minimum delay in delivering messages. Kafka guarantees messages to be delivered to the subscriber’s system due to a message log mechanism. Permanent storage of data in Kafka nodes allows them to be processed also in “batch” mode, analogically to ETL tools.
Kafka has made its way to many organizations where it is necessary to process streams of terabytes of client information with maximum reliability. Examples include: Spotify, Uber or PayPal.
Some potential technical applications of Kafka include:
- Gathering data for the purposes of real-time analysis (-> Apache Spark)
- Data stream processing (-> Apache Storm)
- Assuming the role of Message Oriented Middleware (-> MOM)
- Aggregating application logs