Flink Community Update - August'20

September 4, 2020 - Marta Paes (@morsapaes)

Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019.

The Past Month in Flink #

The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2.2 to be released soon after! Some of the most relevant features in the upcoming release are:

  • DataStream API interoperability, allowing users to embed Stateful Functions pipelines in regular DataStream API programs with DataStream ingress/egress.

  • Fine-grained control over state for remote functions, including the ability to configure different state expiration modes for each individual function.

As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration. If you’d like to get involved, we’re always looking for new contributors!

The community has announced the second patch version to cover some outstanding issues in Flink 1.10. You can find a detailed list with all the improvements and bugfixes that went into Flink 1.10.2 in the announcement blogpost.

The number of FLIPs being created and discussed in the @dev mailing list is growing week over week, as the Flink 1.12 release takes form and some longer-term efforts are kicked-off. Below are some of the new FLIPs to keep an eye out for!

  • Consolidate User-Facing APIs and Deprecate the DataSet API
  • The community proposes to deprecate the DataSet API in favor of the Table API/SQL and the DataStream API, in the long run. For this to be feasible, both APIs first need to be adapted and expanded to support the additional use cases currently covered by the DataSet API.

    The first discussion to branch out of this "umbrella" FLIP is around support for a batch execution mode in the DataStream API (FLIP-134).

  • Approximate Task-Local Recovery
  • To better accommodate recovery scenarios where a certain amount of data loss is tolerable, but a full pipeline restart is not desirable, the community plans to introduce a new failover strategy that allows to restart only the failed task(s). Approximate task-local recovery will allow users to trade consistency for fast failure recovery, which is handy for use cases like online training.

  • Improve the interoperability between DataStream and Table API
  • The Table API has seen a great deal of refactoring and new features in recent releases, but the interfaces to and from the DataStream API haven't been updated accordingly. The work in this FLIP will cover multiple known gaps to improve interoperability and expose important functionality also to the DataStream API (e.g. changelog handling).

  • Support Stateful Python UDFs
  • Python UDFs have been supported in PyFlink since 1.10, but were so far limited to stateless functions. The community is now looking to introduce stateful aggregate functions (UDAFs) in the Python Table API.

    Note: Pandas UDAFs are covered in a separate proposal (FLIP-137).

For a complete overview of the development threads coming up in the project, check the Flink 1.12 Release Wiki and follow the feature discussions in the @dev mailing list.

New Committers and PMC Members #

The Apache Flink community has welcomed 1 new PMC Member and 1 new Committer since the last update. Congratulations!

New PMC Members #

New Committers #

The Bigger Picture #

Roughly a year ago, we did a roundup of community stats to understand how far Flink (and the Flink community) had come in 2019. Where does Flink stand now? What changed?

Perhaps the most impressive result this time around is the surge in activity in the @user-zh mailing list. What started as an effort to better support the chinese-speaking users early in 2019 is now even exceeding the level of activity of the (already very active) main @user mailing list. Also @dev1 registered the highest ever peaks in activity in the months leading to the release of Flink 1.11!

For what it’s worth, the Flink GitHub repository is now headed to 15k stars, after reaching the 10k milestone last year. If you consider some other numbers we gathered previously on repository activity and releases over time, 2020 is looking like one for the books in the Flink community.

1. Excluding messages from “jira@apache.org”.

To put these numbers into perspective, the report for the financial year of 2020 from the Apache Software Foundation (ASF) features Flink as one of the most active open source projects, with mentions for:

  • Most Active Sources: Visits (#2)

* Top Repositories by Number of Commits (#2)

* Top Most Active Apache Mailing Lists (@user (#1) and @dev (#2))

For more details on where Flink and other open source projects stand in the bigger ASF picture, check out the full report.

Google Season of Docs 2020 Results #

In a previous update, we announced that Flink had been selected for Google Season of Docs (GSoD) 2020, an initiative to pair technical writers with mentors to work on documentation for open source projects. Today, we’d like to welcome the two technical writers that will be working with the Flink community to improve the Table API/SQL documentation: Kartik Khare and Muhammad Haseeb Asif!

  • Kartik is a software engineer at Walmart Labs and a regular contributor to multiple Apache projects. He is also a prolific writer on Medium and has previously published on the Flink blog. Last year, he contributed to Apache Airflow as part of GSoD and he’s currently revamping the Apache Pinot documentation.

  • Muhammad is a dual degree master student at KTH and TU Berlin, with a focus on distributed systems and data intensive processing (in particular, performance optimization of state backends). He writes frequently about Flink on Medium and you can catch him at Flink Forward later this year!

We’re looking forward to the next 3 months of collaboration, and would like to thank again all the applicants that invested time into their applications for GSoD with Flink.

Upcoming Events (and More!) #

With conference season in full swing, we’re glad to see some great Flink content coming up in September! Here, we highlight some of the Flink talks happening soon in virtual events.

As usual, we also leave you with some resources to read and explore.

Flink Packages

If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.