Flink Community Update - April'20

March 30, 2020 - Marta Paes (@morsapaes)

While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.

And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). Hope to see you there!

The Year (so far) in Flink #

To kick off the new year, the Flink community released Flink 1.10 with the record contribution of over 200 engineers. This release introduced significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and advances in Python support (PyFlink). Flink 1.10 also marked the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.

The community is now discussing the release of Flink 1.10.1, covering some outstanding bugs from Flink 1.10.

Stateful Functions Contribution and 2.0 Release #

Last January, the first version of Stateful Functions (statefun.io) code was pushed to the Flink repository. Stateful Functions started out as an API to build general purpose event-driven applications on Flink, taking advantage of its advanced state management mechanism to cut the “middleman” that usually handles state coordination in such applications (e.g. a database).

In a recent update, some new features were announced, like multi-language support (including a Python SDK), function unit testing and Stateful Functions’ own flavor of the State Processor API. The release cycle will be independent from core Flink releases and the Release Candidate (RC) has been created — so, you can expect Stateful Functions 2.0 to be released very soon!

Amidst the usual outpour of discussion threads, JIRA tickets and FLIPs, the community is working full steam on bringing Flink 1.11 to life in the next few months. The feature freeze is currently scheduled for late April, so the release is expected around mid May. The upcoming release will focus on new features and integrations that broaden the scope of Flink use cases, as well as core runtime enhancements to streamline the operations of complex deployments.

Some of the plans on the use case side include support for changelog streams in the Table API/SQL (FLIP-105), easy streaming data ingestion into Apache Hive (FLIP-115) and support for Pandas DataFrames in PyFlink. On the operational side, the much anticipated new Source API (FLIP-27) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (FLIP-76) and some changes to network memory management will allow to speed up checkpointing under backpressure.

Throw into the mix improvements around type systems, the WebUI, metrics reporting and supported formats, this release is bound to keep the community busy. For a complete overview of the ongoing development, check this discussion and follow the weekly updates on the Flink @community mailing list.

New Committers and PMC Members #

The Apache Flink community has welcomed 1 PMC (Project Management Committee) Member and 5 new Committers since the last update (September 2019):

New PMC Members #

Jark Wu

New Committers #

Zili Chen, Jingsong Lee, Yu Li, Dian Fu, Zhu Zhu

Congratulations to all and thank you for your hardworking commitment to Flink!

The Bigger Picture #

In the last update, we shared some numbers around Flink releases and mailing list activity. This time, we’re looking into the activity in the Flink repository and how it’s evolving.

GitHub 1

There is a clear upward trend in the number of contributions to the repository, based on the number of commits. This reflects the fast pace of development the project is experiencing and also the successful integration of the China-based Flink contributors started early last year. To complement these observations, the repository registered a 1.5x increase in the number of individual contributors in 2019, compared to the previous year.

But did this increase in capacity produce any other measurable benefits?

GitHub 2

If we look at the average time of Pull Request (PR) “resolution”, it seems like it did: the average time it takes to close a PR these days has been steadily decreasing since last year, sitting between 5-6 days for the past few months.

These are great indicators of the health of Flink as an open source project!

If you missed the launch of flink-packages.org, here’s a reminder! Ververica has created (and open sourced) a website that showcases the work of the community to push forward the ecosystem surrounding Flink. There, you can explore existing packages (like the Pravega and Pulsar Flink connectors, or the Flink Kubernetes operators developed by Google and Lyft) and also submit your own contributions to the ecosystem.

The community has recently launched the “Engine Room”, a dedicated space in Flink’s Wiki for knowledge sharing between contributors. The goal of this initiative is to make ongoing development on Flink internals more transparent across different work streams, and also to help new contributors get on board with best practices. The first blogpost is already up and sheds light on the migration of Flink’s CI infrastructure from Travis to Azure Pipelines.

Upcoming Events #

The organization of Flink Forward had to make the hard decision of cancelling this year’s event in San Francisco. But all is not lost! Flink Forward SF will be held online on April 22-24 and you can register (for free) here. Join the community for interactive talks and Q&A sessions with core Flink contributors and companies like Splunk, Lyft, Netflix or Google.

Others #

Events across the globe have come to a halt due to the growing concerns around COVID-19, so this time we’ll leave you with some interesting content to read instead. In addition to this written content, you can also recap last year’s sessions from Flink Forward Berlin and Flink Forward China!

Type Links

If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.