Apache Flink 0.8.0 available

January 21, 2015 -

We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!

Download Flink 0.8.0

See the release changelog

Overview of major new features #

Extended filesystem support: The former DistributedFileSystem interface has been generalized to HadoopFileSystem now supporting all sub classes of org.apache.hadoop.fs.FileSystem. This allows users to use all file systems supported by Hadoop with Apache Flink. See connecting to other systems
Streaming Scala API: As an alternative to the existing Java API Streaming is now also programmable in Scala. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release.
Streaming windowing semantics: The new windowing api offers an expressive way to define custom logic for triggering the execution of a stream window and removing elements. The new features include out-of-the-box support for windows based in logical or physical time and data-driven properties on the events themselves among others. Read more here
Mutable and immutable objects in runtime All Flink versions before 0.8.0 were always passing the same objects to functions written by users. This is a common performance optimization, also used in other systems such as Hadoop. However, this is error-prone for new users because one has to carefully check that references to the object aren’t kept in the user function. Starting from 0.8.0, Flink allows to configure a mode which is disabling that mechanism.
Performance and usability improvements: The new Apache Flink 0.8.0 release brings several new features which will significantly improve the performance and the usability of the system. Amongst others, these features include:
- Improved input split assignment which maximizes computation locality
- Smart broadcasting mechanism which minimizes network I/O
- Custom partitioners which let the user control how the data is partitioned within the cluster. This helps to prevent data skewness and allows to implement highly efficient algorithms.
- coGroup operator now supports group sorting for its inputs
Kryo is the new fallback serializer: Apache Flink has a sophisticated type analysis and serialization framework that is able to handle commonly used types very efficiently. In addition to that, there is a fallback serializer for types which are not supported. Older versions of Flink used the reflective Avro serializer for that purpose. With this release, Flink is using the powerful Kryo and twitter-chill library for support of types such as Java Collections and Scala specifc types.
Hadoop 2.2.0+ is now the default Hadoop dependency: With Flink 0.8.0 we made the “hadoop2” build profile the default build for Flink. This means that all users using Hadoop 1 (0.2X or 1.2.X versions) have to specify version “0.8.0-hadoop1” in their pom files.
HBase module updated The HBase version has been updated to 0.98.6.1. Also, Hbase is now available to the Hadoop1 and Hadoop2 profile of Flink.

Contributors #

Marton Balassi
Daniel Bali
Carsten Brandt
Moritz Borgmann
Stefan Bunk
Paris Carbone
Ufuk Celebi
Nils Engelbach
Stephan Ewen
Gyula Fora
Gabor Hermann
Fabian Hueske
Vasiliki Kalavri
Johannes Kirschnick
Aljoscha Krettek
Suneel Marthi
Robert Metzger
Felix Neutatz
Chiwan Park
Flavio Pompermaier
Mingliang Qi
Shiva Teja Reddy
Till Rohrmann
Henry Saputra
Kousuke Saruta
Chesney Schepler
Erich Schubert
Peter Szabo
Jonas Traub
Kostas Tzoumas
Timo Walther
Daniel Warneke
Chen Xu