Apache Flink 0.8.0 availableJanuary 21, 2015 -
We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!
Overview of major new features #
Extended filesystem support: The former
DistributedFileSysteminterface has been generalized to
HadoopFileSystemnow supporting all sub classes of
org.apache.hadoop.fs.FileSystem. This allows users to use all file systems supported by Hadoop with Apache Flink. See connecting to other systems
Streaming Scala API: As an alternative to the existing Java API Streaming is now also programmable in Scala. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release.
Streaming windowing semantics: The new windowing api offers an expressive way to define custom logic for triggering the execution of a stream window and removing elements. The new features include out-of-the-box support for windows based in logical or physical time and data-driven properties on the events themselves among others. Read more here
Mutable and immutable objects in runtime All Flink versions before 0.8.0 were always passing the same objects to functions written by users. This is a common performance optimization, also used in other systems such as Hadoop. However, this is error-prone for new users because one has to carefully check that references to the object aren’t kept in the user function. Starting from 0.8.0, Flink allows to configure a mode which is disabling that mechanism.
Performance and usability improvements: The new Apache Flink 0.8.0 release brings several new features which will significantly improve the performance and the usability of the system. Amongst others, these features include:
- Improved input split assignment which maximizes computation locality
- Smart broadcasting mechanism which minimizes network I/O
- Custom partitioners which let the user control how the data is partitioned within the cluster. This helps to prevent data skewness and allows to implement highly efficient algorithms.
- coGroup operator now supports group sorting for its inputs
Kryo is the new fallback serializer: Apache Flink has a sophisticated type analysis and serialization framework that is able to handle commonly used types very efficiently. In addition to that, there is a fallback serializer for types which are not supported. Older versions of Flink used the reflective Avro serializer for that purpose. With this release, Flink is using the powerful Kryo and twitter-chill library for support of types such as Java Collections and Scala specifc types.
Hadoop 2.2.0+ is now the default Hadoop dependency: With Flink 0.8.0 we made the “hadoop2” build profile the default build for Flink. This means that all users using Hadoop 1 (0.2X or 1.2.X versions) have to specify version “0.8.0-hadoop1” in their pom files.
HBase module updated The HBase version has been updated to 0.98.6.1. Also, Hbase is now available to the Hadoop1 and Hadoop2 profile of Flink.
- Marton Balassi
- Daniel Bali
- Carsten Brandt
- Moritz Borgmann
- Stefan Bunk
- Paris Carbone
- Ufuk Celebi
- Nils Engelbach
- Stephan Ewen
- Gyula Fora
- Gabor Hermann
- Fabian Hueske
- Vasiliki Kalavri
- Johannes Kirschnick
- Aljoscha Krettek
- Suneel Marthi
- Robert Metzger
- Felix Neutatz
- Chiwan Park
- Flavio Pompermaier
- Mingliang Qi
- Shiva Teja Reddy
- Till Rohrmann
- Henry Saputra
- Kousuke Saruta
- Chesney Schepler
- Erich Schubert
- Peter Szabo
- Jonas Traub
- Kostas Tzoumas
- Timo Walther
- Daniel Warneke
- Chen Xu