Announcing Apache Flink 1.2.0

06 Feb 2017 by Robert Metzger

The Apache Flink community is pleased to announce the 1.2.0 release. Over the past months, the Flink community has been working hard to resolve 650 issues. See the complete changelog for more detail.

This is the third major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.

We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, gladly encouraged!

You can find the binaries on the updated Downloads page. Some highlights of the release are listed below.

Dynamic Scaling / Key Groups

Flink now supports changing the parallelism of a streaming job by restoring it from a savepoint with a different parallelism. Both changing the entire job’s parallelism and operator parallelism is supported. In the StreamExecutionEnvironment, users can set a new per-job configuration parameter called “max parallelism”. It determines the upper limit for the parallelism.

By default, the value is set to:

  • 128 : for all parallelism <= 128
  • MIN(nextPowerOfTwo(parallelism + (parallelism / 2)), 2^15): for all parallelism > 128

The following built-in functions and operators support rescaling:

  • Window operator
  • Rolling/Bucketing sink
  • Kafka consumers
  • Continuous File Processing source

The write-ahead log Cassandra sink and the CEP operator are currently not rescalable. Users using the keyed state interfaces can use the dynamic scaling without changing their code.

Rescalable Non-Partitioned State

As part of the dynamic scaling effort, the community has also added rescalable non-partitioned state for operators like the Kafka consumer that don’t use keyed state but instead use operator state.

In case of rescaling, the operator state needs to be redistributed among the parallel consumer instances. In case of the Kafka consumer, the assigned partitions and their offsets are redistributed.

ProcessFunction

The ProcessFunction is a low-level stream processing operation giving access to the basic building blocks of all (acyclic) streaming applications:

  • Events (stream elements)
  • State (fault tolerant, consistent)
  • Timers (event time and processing time)

The ProcessFunction can be thought of as a FlatMapFunction with access to keyed state and timers.

ProcessFunction documentation

Async I/O

Flink now has a dedicated Async I/O operator for making blocking calls asynchronously and in a checkpointed fashion. For example, there are many Flink applications that need to query external datastores for each element in a stream. To avoid slowing down the stream to the speed of the external system, the async I/O operator allows requests to overlap.

Async I/O documentation

The latest release further extends Flink’s deployment flexibility by adding support for Apache Mesos and DC/OS. In combination with Marathon, it is now possible to run an highly available Flink cluster on Mesos.

Mesos documentation

Secure Data Access

Flink is now able to authenticate against external services such as Zookeeper, Kafka, HDFS and YARN using Kerberos. Also, experimental support for encryption over the wire has been added.

Kerberos documentation and SSL setup documentation.

Queryable State

This experimental feature allows users to query the current state of an operator. If you have, for example, a flatMap() operator that keeps a running aggregate per key, queryable state allows you to retrieve the current aggregate value at any time by directly connecting to the TaskManager and retrieving that value.

Queryable State documentation

Backwards compatible savepoints

Flink 1.2.0 allows users to restart a job from an 1.1.4 savepoint. This makes major Flink version upgrades possible without losing application state. The following built-in operators are backwards compatible:

  • Window operator
  • Rolling/Bucketing sink
  • Kafka consumers
  • Continuous File Processing source

Upgrading Flink applications documentation

Table API & SQL

This release significantly expanded the performance, stability, and coverage of Flink’s Table API and SQL support for batch and streaming tables.

The community added tumbling, sliding, and session group-window aggregations over streaming tables e.g. table.window(Session withGap 10.minutes on 'rowtime as 'w)

SQL supports more built-in functions and operations e.g. EXISTS, VALUES, LIMIT, CURRENT_DATE, INITCAP, NULLIF

Both APIs support more data types and are better integrated e.g. access a POJO field myPojo.get('field'), myPojo.flatten()

Users can now define their own scalar and table functions e.g. table.select('uid, parse('field) as 'parsed).join(split('parsed) as 'atom)

Flink Table API & SQL documentation

Miscellaneous improvements

  • Metrics in Flink web interface: A metrics system was added in Flink 1.1, and with this release, Flink provides a new tab in the web frontend to see some of the metrics in the web UI.

  • Kafka 0.10 support: Flink 1.2 now provides a connector for Apache Kafka 0.10.0.x, including support for consuming and producing messages with a timestamp using Flink’s internal event time (Kafka Connector Documentation)

  • Evictor Semantics: Flink 1.2 ships with more expressive evictor semantics that allow the programmer to evict elements form a window both before and after the application of the window function, and to remove elements arbitrarily (Evictor Semantics Documentation)

List of Contributors

According to git shortlog, the following 122 people contributed to the 1.2.0 release. Thank you to all contributors!

  • Abhishek R. Singh
  • Ahmad Ragab
  • Aleksandr Chermenin
  • Alexander Pivovarov
  • Alexander Shoshin
  • Alexey Diomin
  • Aljoscha Krettek
  • Andrey Melentyev
  • Anton Mushin
  • Bob Thorman
  • Boris Osipov
  • Bram Vogelaar
  • Bruno Aranda
  • David Anderson
  • Dominik
  • Evgeny_Kincharov
  • Fabian Hueske
  • Fokko Driesprong
  • Gabor Gevay
  • George
  • Gordon Tai
  • Greg Hogan
  • Gyula Fora
  • Haohui Mai
  • Holger Frydrych
  • HungUnicorn
  • Ismaël Mejía
  • Ivan Mushketyk
  • Jakub Havlik
  • Jark Wu
  • Jendrik Poloczek
  • Jincheng Sun
  • Josh
  • Joshi
  • Keiji Yoshida
  • Kirill Morozov
  • Kurt Young
  • Liwei Lin
  • Lorenz Buehmann
  • Maciek Próchniak
  • Makman2
  • Markus Müller
  • Martin Junghanns
  • Márton Balassi
  • Max Kuklinski
  • Maximilian Michels
  • Milosz Tanski
  • Nagarjun
  • Neelesh Srinivas Salian
  • Neil Derraugh
  • Nick Chadwick
  • Nico Kruber
  • Niels Basjes
  • Pattarawat Chormai
  • Piotr Godek
  • Raghav
  • Ramkrishna
  • Robert Metzger
  • Rohit Agarwal
  • Roman Maier
  • Sachin
  • Sachin Goel
  • Scott Kidder
  • Shannon Carey
  • Stefan Richter
  • Steffen Hausmann
  • Stephan Epping
  • Stephan Ewen
  • Sunny T
  • Suri
  • Theodore Vasiloudis
  • Till Rohrmann
  • Tony Wei
  • Tzu-Li (Gordon) Tai
  • Ufuk Celebi
  • Vijay Srinivasaraghavan
  • Vishnu Viswanath
  • WangTaoTheTonic
  • William-Sang
  • Yassine Marzougui
  • anton solovev
  • beyond1920
  • biao.liub
  • chobeat
  • danielblazevski
  • f7753
  • fengyelei
  • fengyelei 00406569
  • gallenvara
  • gaolun.gl
  • godfreyhe
  • heytitle
  • hzyuemeng1
  • iteblog
  • kl0u
  • larsbachmann
  • lincoln-lil
  • manuzhang
  • medale
  • miaoever
  • mtunique
  • radekg
  • renkai
  • sergey_sokur
  • shijinkui
  • shuai.xus
  • smarthi
  • swapnil-chougule
  • tedyu
  • tibor.moger
  • tonycox
  • twalthr
  • vasia
  • wenlong.lwl
  • wrighe3
  • xiaogang.sxg
  • yushi.wxg
  • yuzhongliu
  • zentol
  • zhuhaifengleon
  • 淘江
  • 魏偉哲