Using RocksDB State Backend in Apache Flink: When and How

January 18, 2021 - Jun Qin

Stream processing applications are often stateful, “remembering” information from processed events and using it to influence further event processing. In Flink, the remembered information, i.e., state, is stored locally in the configured state backend. To prevent data loss in case of failures, the state backend periodically persists a snapshot of its contents to a pre-configured durable storage. The RocksDB state backend (i.e., RocksDBStateBackend) is one of the three built-in state backends in Flink. ...

Continue reading »

Exploring fine-grained recovery of bounded data sets on Flink

January 11, 2021 - Robert Metzger (@rmetzger_)

Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery. ...

Continue reading »

What's New in the Pulsar Flink Connector 2.7.0

January 7, 2021 - Jianyun Zhao (@yihy8023) Jennifer Huang (@Jennife06125739)

About the Pulsar Flink Connector # In order for companies to access real-time data insights, they need unified batch and streaming capabilities. Apache Flink unifies batch and stream processing into one single computing engine with “streams” as the unified data representation. Although developers have done extensive work at the computing and API layers, very little work has been done at the data messaging and storage layers. In reality, data is segregated into data silos, created by various storage and messaging technologies. ...

Continue reading »

Stateful Functions 2.2.2 Release Announcement

January 2, 2021 - Tzu-Li (Gordon) Tai (@tzulitai)

The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2. The most important change of this bugfix release is upgrading Apache Flink to version 1.11.3. In addition to many stability fixes to the Flink runtime itself, this also allows StateFun applications to safely use savepoints to upgrade from older versions earlier than StateFun 2.2.1. Previously, restoring from savepoints could have failed under certain conditions. ...

Continue reading »

Apache Flink 1.11.3 Released

December 18, 2020 - Xintong Song

The Apache Flink community released the third bugfix version of the Apache Flink 1.11 series. This release includes 151 fixes and minor improvements for Flink 1.11.2. The list below includes a detailed list of all fixes and improvements. We highly recommend all users to upgrade to Flink 1.11.3. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.11.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.11.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.11.3</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

Apache Flink 1.12.0 Release Announcement

December 10, 2020 - Marta Paes (@morsapaes) Aljoscha Krettek (@aljoscha)

The Apache Flink community is excited to announce the release of Flink 1.12.0! Close to 300 contributors worked on over 1k threads to bring significant improvements to usability as well as new features that simplify (and unify) Flink handling across the API stack. Release Highlights The community has added support for efficient batch execution in the DataStream API. This is the next major milestone towards achieving a truly unified runtime for both batch and stream processing. ...

Continue reading »

Improvements in task scheduling for batch workloads in Apache Flink 1.12

December 2, 2020 - Andrey Zagrebin

The Flink community has been working for some time on making Flink a truly unified batch and stream processing system. Achieving this involves touching a lot of different components of the Flink stack, from the user-facing APIs all the way to low-level operator processes such as task scheduling. In this blogpost, we’ll take a closer look at how far the community has come in improving scheduling for batch workloads, why this matters and what you can expect in the Flink 1. ...

Continue reading »

Stateful Functions 2.2.1 Release Announcement

November 11, 2020 - Tzu-Li (Gordon) Tai (@tzulitai)

The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1. This release fixes a critical bug that causes restoring the Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. Starting from this release, StateFun now creates snapshots with a more robust format that allows it to be restored safely going forward. We strongly recommend all users to upgrade to 2. ...

Continue reading »

From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure

October 15, 2020 - Arvid Heise Stephan Ewen

Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both very small and extremely large scenarios and provides support for many operational features like stateful upgrades with state evolution or roll-backs and time-travel. Despite all these great properties, Flink’s checkpointing method has an Achilles Heel: the speed of a completed checkpoint is determined by the speed at which data flows through the application. ...

Continue reading »

Stateful Functions Internals: Behind the scenes of Stateful Serverless

October 13, 2020 - Tzu-Li (Gordon) Tai (@tzulitai)

Stateful Functions (StateFun) simplifies the building of distributed stateful applications by combining the best of two worlds: the strong messaging and state consistency guarantees of stateful stream processing, and the elasticity and serverless experience of today’s cloud-native architectures and popular event-driven FaaS platforms. Typical StateFun applications consist of functions deployed behind simple services using these modern platforms, with a separate StateFun cluster playing the role of an “event-driven database” that provides consistency and fault-tolerance for the functions’ state and messaging. ...

Continue reading »