Introducing Gelly: Graph Processing with Apache Flink

August 24, 2015 -

This blog post introduces Gelly, Apache Flink’s graph-processing API and library. Flink’s native support for iterations makes it a suitable platform for large-scale graph analytics. By leveraging delta iterations, Gelly is able to map various graph processing models such as vertex-centric or gather-sum-apply to Flink dataflows. Gelly allows Flink users to perform end-to-end data analysis in a single system. Gelly can be seamlessly used with Flink’s DataSet API, which means that pre-processing, graph creation, analysis, and post-processing can be done in the same application. ...

Continue reading »

Announcing Apache Flink 0.9.0

June 24, 2015 -

The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far. Download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome! ...

Continue reading »

April 2015 in the Flink community

May 14, 2015 -

April was an packed month for Apache Flink. Flink runner for Google Cloud Dataflow # A Flink runner for Google Cloud Dataflow was announced. See the blog posts by data Artisans and the Google Cloud Platform Blog. Google Cloud Dataflow programs can be written using and open-source SDK and run in multiple backends, either as a managed service inside Google’s infrastructure, or leveraging open source runners, including Apache Flink. Flink 0. ...

Continue reading »

Juggling with Bits and Bytes

May 11, 2015 -

How Apache Flink operates on binary data # Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. ...

Continue reading »

Announcing Flink 0.9.0-milestone1 preview release

April 13, 2015 -

The Apache Flink community is pleased to announce the availability of the 0.9.0-milestone-1 release. The release is a preview of the upcoming 0.9.0 release. It contains many new features which will be available in the upcoming 0.9 release. Interested users are encouraged to try it out and give feedback. As the version number indicates, this release is a preview release that contains known issues. You can download the release here and check out the latest documentation here. ...

Continue reading »

March 2015 in the Flink community

April 7, 2015 -

March has been a busy month in the Flink community. Scaling ALS # Flink committers employed at data Artisans published a blog post on how they scaled matrix factorization with Flink and Google Compute Engine to matrices with 28 billion elements. Learn about the internals of Flink # The community has started an effort to better document the internals of Flink. Check out the first articles on the Flink wiki on how Flink manages memory, how tasks in Flink exchange data, type extraction and serialization in Flink, as well as how Flink builds on Akka for distributed coordination. ...

Continue reading »

Peeking into Apache Flink's Engine Room

March 13, 2015 -

Join Processing in Apache Flink # Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system. ...

Continue reading »

February 2015 in the Flink community

March 2, 2015 -

February might be the shortest month of the year, but this does not mean that the Flink community has not been busy adding features to the system and fixing bugs. Here’s a rundown of the activity in the Flink community last month. 0.8.1 release # Flink 0.8.1 was released. This bugfixing release resolves a total of 22 issues. New committer # Max Michels has been voted a committer by the Flink PMC. ...

Continue reading »

Introducing Flink Streaming

February 9, 2015 -

This post is the first of a series of blog posts on Flink Streaming, the recent addition to Apache Flink that makes it possible to analyze continuous data sources in addition to static files. Flink Streaming uses the pipelined Flink engine to process data streams in real time and offers a new API including definition of flexible windows. In this post, we go through an example that uses the Flink Streaming API to compute statistics on stock market data that arrive continuously and combine the stock market data with Twitter streams. ...

Continue reading »

January 2015 in the Flink community

February 4, 2015 -

Happy 2015! Here is a (hopefully digestible) summary of what happened last month in the Flink community. 0.8.0 release # Flink 0.8.0 was released. See here for the release notes. Flink roadmap # The community has published a roadmap for 2015 on the Flink wiki. Check it out to see what is coming up in Flink, and pick up an issue to contribute! Articles in the press # The Apache Software Foundation announced Flink as a Top-Level Project. ...

Continue reading »