Peeking into Apache Flink's Engine Room

March 13, 2015 -

Join Processing in Apache Flink # Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system. ...

Continue reading »

February 2015 in the Flink community

March 2, 2015 -

February might be the shortest month of the year, but this does not mean that the Flink community has not been busy adding features to the system and fixing bugs. Here’s a rundown of the activity in the Flink community last month. 0.8.1 release # Flink 0.8.1 was released. This bugfixing release resolves a total of 22 issues. New committer # Max Michels has been voted a committer by the Flink PMC. ...

Continue reading »

Introducing Flink Streaming

February 9, 2015 -

This post is the first of a series of blog posts on Flink Streaming, the recent addition to Apache Flink that makes it possible to analyze continuous data sources in addition to static files. Flink Streaming uses the pipelined Flink engine to process data streams in real time and offers a new API including definition of flexible windows. In this post, we go through an example that uses the Flink Streaming API to compute statistics on stock market data that arrive continuously and combine the stock market data with Twitter streams. ...

Continue reading »

January 2015 in the Flink community

February 4, 2015 -

Happy 2015! Here is a (hopefully digestible) summary of what happened last month in the Flink community. 0.8.0 release # Flink 0.8.0 was released. See here for the release notes. Flink roadmap # The community has published a roadmap for 2015 on the Flink wiki. Check it out to see what is coming up in Flink, and pick up an issue to contribute! Articles in the press # The Apache Software Foundation announced Flink as a Top-Level Project. ...

Continue reading »

Apache Flink 0.8.0 available

January 21, 2015 -

We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them! Download Flink 0.8.0 See the release changelog Overview of major new features # Extended filesystem support: The former DistributedFileSystem interface has been generalized to HadoopFileSystem now supporting all sub classes of org. ...

Continue reading »

December 2014 in the Flink community

January 6, 2015 -

This is the first blog post of a “newsletter” like series where we give a summary of the monthly activity in the Flink community. As the Flink project grows, this can serve as a “tl;dr” for people that are not following the Flink dev and user mailing lists, or those that are simply overwhelmed by the traffic. Flink graduation # The biggest news is that the Apache board approved Flink as a top-level Apache project! ...

Continue reading »

Hadoop Compatibility in Flink

November 18, 2014 -

Apache Hadoop is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. ...

Continue reading »

Apache Flink 0.7.0 available

November 4, 2014 -

We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them! Download Flink 0.7.0 here See the release changelog here Overview of major new features # Flink Streaming: The gem of the 0. ...

Continue reading »

Upcoming Events

October 3, 2014 -

We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a Flink hackathon in Stockholm (Oct 8-9) and a talk about Flink at the Stockholm Hadoop User Group (Oct 8). This is followed by the very first Flink Meetup in Berlin (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the Pasadena Big Data User Group (Oct 29) and the second one at Silicon Valley Hands On Programming Events (Nov 4). ...

Continue reading »

Apache Flink 0.6.1 available

September 26, 2014 -

We are happy to announce the availability of Flink 0.6.1. 0.6.1 is a maintenance release, which includes minor fixes across several parts of the system. We suggest all users of Flink to work with this newest version. Download the release today.

Continue reading »