March 2015 in the Flink community

April 7, 2015 -

March has been a busy month in the Flink community.

Scaling ALS #

Flink committers employed at data Artisans published a blog post on how they scaled matrix factorization with Flink and Google Compute Engine to matrices with 28 billion elements.

The community has started an effort to better document the internals of Flink. Check out the first articles on the Flink wiki on how Flink manages memory, how tasks in Flink exchange data, type extraction and serialization in Flink, as well as how Flink builds on Akka for distributed coordination.

Check out also the new blog post on how Flink executes joins with several insights into Flink’s runtime.

Meetups and talks #

Flink’s machine learning efforts were presented at the Machine Learning Stockholm meetup group. The regular Berlin Flink meetup featured a talk on the past, present, and future of Flink. The talk is available on youtube.

Table API in Scala and Java #

The new Table API in Flink is now available in both Java and Scala. Check out the examples here (Java) and here (Scala).

Additions to the Machine Learning library #

Flink’s Machine Learning library is seeing quite a bit of traction. Recent additions include the CoCoA algorithm for distributed optimization.

Exactly-once delivery guarantees for streaming jobs #

Flink streaming jobs now provide exactly once processing guarantees when coupled with persistent sources (notably Apache Kafka). Flink periodically checkpoints and persists the offsets of the sources and restarts from those checkpoints at failure recovery. This functionality is currently limited in that it does not yet handle large state and iterative programs.