June 17, 2022 -
Lijie Wang
Zhu Zhu
Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling.
To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday.
...
Continue reading »
June 5, 2022 -
Gyula Fora
(@GyulaFora)
Yang Wang
In the last two months since our initial preview release the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. We are now proud to announce the first production ready release of the operator project.
Release Highlights # The Flink Kubernetes Operator 1.0.0 version brings numerous improvements and new features to almost every aspect of the operator.
New v1beta1 API version & compatibility guarantees Session Job Management support Support for Flink 1.
...
Continue reading »
May 30, 2022 -
Roman Khachatryan
Yuan Mei
Introduction # One of the most important characteristics of stream processing systems is end-to-end latency, i.e. the time it takes for the results of processing an input record to reach the outputs. In the case of Flink, end-to-end latency mostly depends on the checkpointing mechanism, because processing results should only become visible after the state of the stream is persisted to non-volatile storage (this is assuming exactly-once mode; in other modes, results can be published immediately).
...
Continue reading »
May 23, 2022 -
Jun Qin
Nico Kruber
This series of blog posts present a collection of low-latency techniques in Flink. In part one, we discussed the types of latency in Flink and the way we measure end-to-end latency and presented a few techniques that optimize latency directly. In this post, we will continue with a few more direct latency optimization techniques. Just like in part one, for each optimization technique, we will clarify what it is, when to use it, and what to keep in mind when using it.
...
Continue reading »
May 18, 2022 -
Jun Qin
Nico Kruber
Apache Flink is a stream processing framework well known for its low latency processing capabilities. It is generic and suitable for a wide range of use cases. As a Flink application developer or a cluster administrator, you need to find the right gear that is best for your application. In other words, you don’t want to be driving a luxury sports car while only using the first gear.
In this multi-part series, we will present a collection of low-latency techniques in Flink.
...
Continue reading »
May 11, 2022 -
Jingsong Lee
Jiangjie (Becket) Qin
The Apache Flink community is pleased to announce the preview release of the Apache Flink Table Store (0.1.0).
Please check out the full documentation for detailed information and user guides.
Note: Flink Table Store is still in beta status and undergoing rapid development. We do not recommend that you use it directly in a production environment.
What is Flink Table Store # In the past years, thanks to our numerous contributors and users, Apache Flink has established itself as one of the best distributed computing engines, especially for stateful stream processing at large scale.
...
Continue reading »
May 6, 2022 -
Xingbo Huang
Dian Fu
PyFlink was introduced in Flink 1.9 which purpose is to bring the power of Flink to Python users and allow Python users to develop Flink jobs in Python language. The functionality becomes more and more mature through the development in the past releases.
Before Flink 1.15, Python user-defined functions will be executed in separate Python processes (based on the Apache Beam Portability Framework). It will bring additional serialization/deserialization overhead and also communication overhead.
...
Continue reading »
May 6, 2022 -
Dawid Wysakowicz
(@dwysakowicz)
Daisy Tsang
Flink has become a well established data streaming engine and a mature project requires some shifting of priorities from thinking purely about new features towards improving stability and operational simplicity. In the last couple of releases, the Flink community has tried to address some known friction points, which includes improvements to the snapshotting process. Snapshotting takes a global, consistent image of the state of a Flink job and is integral to fault-tolerance and exacty-once processing.
...
Continue reading »
May 5, 2022 -
Joe Moser
(@JoemoeAT)
Yun Gao
(@YunGao16)
Thanks to our well-organized and open community, Apache Flink continues to grow as a technology and remain one of the most active projects in the Apache community. With the release of Flink 1.15, we are proud to announce a number of exciting changes.
One of the main concepts that makes Apache Flink stand out is the unification of batch (aka bounded) and stream (aka unbounded) data processing, which helps reduce the complexity of development.
...
Continue reading »
April 3, 2022 -
Gyula Fora
(@GyulaFora)
The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)
The Flink Kubernetes Operator allows users to easily manage their Flink deployment lifecycle using native Kubernetes tooling.
The operator takes care of submitting, savepointing, upgrading and generally managing Flink jobs using the built-in Flink Kubernetes integration. This way users do not have to use the Flink Clients (e.g. CLI) or interact with the Flink jobs manually, they only have to declare the desired deployment specification and the operator will take care of the rest.
...
Continue reading »