Flink is a younger brother of Spark and Hadoop. Like them, it is scalable and enables fast processing of large amounts of data. It also enables connection with various data sources (Kafka, HDFS, Cassandra, HBase, SQL databases, etc.). And similarly, it fits with diverse IT environments, because it can be launched in the sames ones as Spark and Hadoop, in standalone mode, in a cluster, and of course in the cloud. Moreover, it offers both batch and stream processing.
Flink is available under an open-source license, which means one can freely analyze what is going on in the system. It has a thriving community, too, which makes it much easier to find solutions to problems. Despite its young age, it is a mature technology with numerous production launches in such companies as: ING, Ericsson, Alibaba, Uber, Zalando, Netflix, Telefonica.
What is it for?
Flink’s added value among older generation systems is its excellent support for data stream processing. Among all open source solutions, it offers the best support for reliable performance of data operations which require quick reaction with very low latency (e.g. detecting financial fraud on the basis of transactions). It is excellent at performing more complicated operations due to its innovative calculation status handling. Many of these are possible to implement in older systems (e.g. Spark), but only Flink gives such possibilities while maintaining very high performance. Flink is also faster when it comes to iterative calculations, such as machine learning.
Examples of use
- monitoring network infrastructure, e.g. in order to detect anomalies
- detecting financial fraud with latency as small as several milliseconds
- real-time analysis of such behaviour as clicks on an e-commerce website, activity in an on-line game or on social media (e.g. sentiment analysis)
- telemetry, or e.g. monitoring the use of a service for billing purposes
- reacting to financial market changes
- pattern detection through complex event processing, e.g. when a user is filling in a form, but stops after reaching the 3rd page and does not come back
- calculating the moving average for quantity in the past 30 minutes
- IoT: monitoring sensors in vehicles, industrial and agricultural machines, using machine learning for fast fault detection
- as a tool for a data science team to perform SQL queries on a data stream