Apache Flink 1.3.0 Release Announcement

June 1, 2017 -

The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the complete changelog for more detail.

This is the fourth major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.

Users can expect Flink releases now in a 4 month cycle. At the beginning of the 1.3 release cycle, the community decided to follow a strict time-based release model.

We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, gladly encouraged!

You can find the binaries on the updated Downloads page. Some highlights of the release are listed below.

Large State Handling/Recovery #

Incremental Checkpointing for RocksDB: It is now possible to checkpoint only the difference from the previous successful checkpoint, rather than checkpointing the entire application state. This speeds up checkpointing and saves disk space, because the individual checkpoints are smaller. (FLINK-5053).
Asynchronous snapshots for heap-based state backends: The filesystem and memory statebackends now also support asynchronous snapshots using a copy-on-write HashMap implementation. Asynchronous snapshotting makes Flink more resilient to slow storage systems and expensive serialization. The time an operator blocks on a snapshot is reduced to a minimum (FLINK-6048, FLINK-5715).
Allow upgrades to state serializers: Users can now upgrade serializers, while keeping their application state. One use case of this is upgrading custom serializers used for managed operator state/keyed state. Also, registration order for POJO types/Kryo types is now no longer fixed (Documentation, FLINK-6178).
Recover job state at the granularity of operator: Before Flink 1.3, operator state was bound to Flink’s internal “Task” representation. This made it hard to change a job’s topology while keeping its state around. With this change, users are allowed to do more topology changes (un-chain operators) by restoring state into logical operators instead of “Tasks” (FLINK-5892).
Fine-grained recovery (beta): Instead of restarting the complete ExecutionGraph in case of a task failure, Flink is now able to restart only the affected subgraph and thereby significantly decrease recovery time (FLINK-4256).

DataStream API #

Side Outputs: This change allows users to have more than one output stream for an operator. Operator metadata, internal system information (debugging, performance etc.) or rejected/late elements are potential use-cases for this new API feature. The Window operator is now using this new feature for late window elements (Side Outputs Documentation, FLINK-4460).
Union Operator State: Flink 1.2.0 introduced broadcast state functionality, but this had not yet been exposed via a public API. Flink 1.3.0 provides the Union Operator State API for exposing broadcast operator state. The union state will send the entire state across all parallel instances to each instance on restore, giving each operator a full view of the state (FLINK-5991).
Per-Window State: Previously, the state that a WindowFunction or ProcessWindowFunction could access was scoped to the key of the window but not the window itself. With this new feature, users can keep window state independent of the key (FLINK-5929).

Deployment and Tooling #

Flink HistoryServer: Flink’s HistoryServer now allows you to query the status and statistics of completed jobs that have been archived by a JobManager (FLINK-1579).
Watermark Monitoring in Web Front-end: For easier diagnosis of watermark issues, the Flink JobManager front-end now provides a new tab to track the watermark of each operator (FLINK-3427).
Datadog HTTP Metrics Reporter: Datadog is a widely-used metrics system, and Flink now offers a Datadog reporter that contacts the Datadog http endpoint directly (FLINK-6013).
Network Buffer Configuration: We finally got rid of the tedious network buffer configuration and replaced it with a more generic approach. First of all, you may now follow the idiom “more is better” without any penalty on the latency which could previously occur due to excessive buffering in incoming and outgoing channels. Secondly, instead of defining an absolute number of network buffers, we now use fractions of the available JVM memory (10% by default). This should cover more use cases by default and may also be tweaked by defining a minimum and maximum size.

→ See Configuring the Network Buffers in the Flink documentation.

Table API / SQL #

Support for Retractions in Table API / SQL: As part of our endeavor to support continuous queries on Dynamic Tables, Retraction is an important building block that will enable a whole range of new applications which require updating previously-emitted results. Examples for such use cases are computation of early results for long-running windows, updates due to late arriving data, or maintaining constantly changing results similar to materialized views in relational database systems. Flink 1.3.0 supports retraction for non-windowed aggregates. Results with updates can be either converted into a DataStream or materialized to external data stores using TableSinks with upsert or retraction support.
Extended support for aggregations in Table API / SQL: With Flink 1.3.0, the Table API and SQL support many more types of aggregations, including
- GROUP BY window aggregations in SQL (via the window functions TUMBLE, HOP, and SESSION windows) for both batch and streaming.
- SQL OVER window aggregations (only for streaming)
- Non-windowed aggregations (in streaming with retractions).
- User-defined aggregation functions for custom aggregation logic.
External catalog support: The Table API & SQL allows to register external catalogs. Table API and SQL queries can then have access to table sources and their schema from the external catalogs without register those tables one by one.

→ See the Flink documentation for details about these features.

The Table API / SQL documentation is currently being reworked. The community plans to publish the updated docs in the week of June 5th.

Connectors #

ElasticSearch 5.x support: The ElasticSearch connectors have been restructured to have a common base module and specific modules for ES 1, 2 and 5, similar to how the Kafka connectors are organized. This will make fixes and future improvements available across all ES versions (FLINK-4988).
Allow rescaling the Kinesis Consumer: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the Kinesis Consumer also makes use of that engine feature (FLINK-4821).
Transparent shard discovery for Kinesis Consumer: The Kinesis consumer can now discover new shards without failing / restarting jobs when a resharding is happening (FLINK-4577).
Allow setting custom start positions for the Kafka consumer: With this change, you can instruct Flink’s Kafka consumer to start reading messages from a specific offset (FLINK-3123) or earliest / latest offset (FLINK-4280) without respecting committed offsets in Kafka.
Allow out-opt from offset committing for the Kafka consumer: By default, Kafka commits the offsets to the Kafka broker once a checkpoint has been completed. This change allows users to disable this mechanism (FLINK-3398).

CEP Library #

The CEP library has been greatly enhanced and is now able to accommodate more use-cases out-of-the-box (expressivity enhancements), make more efficient use of the available resources, adjust to changing runtime conditions–all without breaking backwards compatibility of operator state.

Please note that the API of the CEP library has been updated with this release.

Below are some of the main features of the revamped CEP library:

Make CEP operators rescalable: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the CEP library also makes use of that engine feature (FLINK-5420).
New operators for the CEP library:
- Quantifiers (*,+,?) for the pattern API (FLINK-3318)
- Support for different continuity requirements (FLINK-6208)
- Support for iterative conditions (FLINK-6197)

Gelly Library #

Unified driver for running Gelly examples FLINK-4949).
PageRank algorithm for directed graphs (FLINK-4896).
Add Circulant and Echo graph generators (FLINK-6393).

Known Issues #

There are two known issues in Flink 1.3.0. Both will be addressed in the 1.3.1 release.

FLINK-6783: Wrongly extracted TypeInformations for WindowedStream::aggregate
FLINK-6775: StateDescriptor cannot be shared by multiple subtasks

List of Contributors #

According to git shortlog, the following 103 people contributed to the 1.3.0 release. Thank you to all contributors!

Addison Higham, Alexey Diomin, Aljoscha Krettek, Andrea Sella, Andrey Melentyev, Anton Mushin, barcahead, biao.liub, Bowen Li, Chen Qin, Chico Sokol, David Anderson, Dawid Wysakowicz, DmytroShkvyra, Fabian Hueske, Fabian Wollert, fengyelei, Flavio Pompermaier, FlorianFan, Fokko Driesprong, Geoffrey Mon, godfreyhe, gosubpl, Greg Hogan, guowei.mgw, hamstah, Haohui Mai, Hequn Cheng, hequn.chq, heytitle, hongyuhong, Jamie Grier, Jark Wu, jingzhang, Jinkui Shi, Jin Mingjian, Joerg Schad, Joshua Griffith, Jürgen Thomann, kaibozhou, Kathleen Sharp, Ken Geis, kkloudas, Kurt Young, lincoln-lil, lingjinjiang, liuyuzhong7, Lorenz Buehmann, manuzhang, Marc Tremblay, Mauro Cortellazzi, Max Kuklinski, mengji.fy, Mike Dias, mtunique, Nico Kruber, Omar Erminy, Patrick Lucas, paul, phoenixjiangnan, rami-alisawi, Ramkrishna, Rick Cox, Robert Metzger, Rodrigo Bonifacio, rtudoran, Seth Wiesman, Shaoxuan Wang, shijinkui, shuai.xus, Shuyi Chen, spkavuly, Stefano Bortoli, Stefan Richter, Stephan Ewen, Stephen Gran, sunjincheng121, tedyu, Till Rohrmann, tonycox, Tony Wei, twalthr, Tzu-Li (Gordon) Tai, Ufuk Celebi, Ventura Del Monte, Vijay Srinivasaraghavan, WangTaoTheTonic, wenlong.lwl, xccui, xiaogang.sxg, Xpray, zcb, zentol, zhangminglei, Zhenghua Gao, Zhijiang, Zhuoluo Yang, zjureel, Zohar Mizrahi, 士远, 槿瑜, 淘江, 金竹