Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

May 9, 2023 - Etienne Chauchot (@echauchot)

Introduction # The Flink community has been deprecating the DataSet API since version 1.12 as part of the work on FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API) . This blog article illustrates the migration of a real-life batch DataSet pipeline to a batch DataStream pipeline. All the code presented in this article is available in the tpcds-benchmark-flink repo. The use case shown here is extracted from a broader work comparing Flink performances of different APIs by implementing TPCDS queries using these APIs. ...

Continue reading »

Howto create a batch source with the new Source framework

May 3, 2023 - Etienne Chauchot (@echauchot)

Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! Implementing the source components # The source architecture is depicted in the diagrams below: ...

Continue reading »

Apache Flink ML 2.2.0 Release Announcement

April 19, 2023 - Dong Lin

The Apache Flink community is excited to announce the release of Flink ML 2.2.0! This release focuses on enriching Flink ML’s feature engineering algorithms. The library now includes 33 feature engineering algorithms, making it a more comprehensive library for feature engineering tasks. With the addition of these algorithms, we believe Flink ML library is ready for use in production jobs that require feature engineering capabilities, whose input can then be consumed by both offline and online machine learning tasks. ...

Continue reading »

Announcing the Release of Apache Flink 1.17

March 23, 2023 - Leonard Xu (@Leonardxbj)

The Apache Flink PMC is pleased to announce Apache Flink release 1.17.0. Apache Flink is the leading stream processing standard, and the concept of unified stream and batch data processing is being successfully adopted in more and more companies. Thanks to our excellent community and contributors, Apache Flink continues to grow as a technology and remains one of the most active projects in the Apache Software Foundation. Flink 1.17 had 172 contributors enthusiastically participating and saw the completion of 7 FLIPs and 600+ issues, bringing many exciting new features and improvements to the community. ...

Continue reading »

Apache Flink 1.15.4 Release Announcement

March 15, 2023 - Danny Cranmer

The Apache Flink Community is pleased to announce the fourth bug fix release of the Flink 1.15 series. This release includes 53 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users upgrade to Flink 1.15.4. ...

Continue reading »

Apache Flink Kubernetes Operator 1.4.0 Release Announcement

February 27, 2023 - Gyula Fora (@GyulaFora) Maximilian Michels (@stadtlegende) Matyas Orhidi (@matyasorhidi)

We are proud to announce the latest stable release of the operator. In addition to the expected stability improvements and fixes, the 1.4.0 release introduces the first version of the long-awaited autoscaler module. Flink Streaming Job Autoscaler # A highly requested feature for Flink applications is the ability to scale the pipeline based on incoming data load and the utilization of the dataflow. While Flink has already provided some of the required building blocks, this feature has not yet been realized in the open source ecosystem. ...

Continue reading »

Apache Flink 1.16.1 Release Announcement

January 30, 2023 - Martijn Visser (@martijnvisser82)

The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.16 series. This release includes 84 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users upgrade to Flink 1.16.1. ...

Continue reading »

Delegation Token Framework: Obtain, Distribute and Use Temporary Credentials Automatically

January 20, 2023 - Gabor Somogyi Marton Balassi (@MartonBalassi)

The Apache Flink Community is pleased to announce that the upcoming minor version of Flink (1.17) includes the Delegation Token Framework proposed in FLIP-272. This enables Flink to authenticate to external services at a central location (JobManager) and distribute authentication tokens to the TaskManagers. Introduction # Authentication in distributed systems is not an easy task. Previously all worker nodes (TaskManagers) reading from or writing to an external system needed to authenticate on their own. ...

Continue reading »

Apache Flink Table Store 0.3.0 Release Announcement

January 13, 2023 - Jingsong Lee

The Apache Flink community is pleased to announce the release of the Apache Flink Table Store (0.3.0). We highly recommend all users upgrade to Flink Table Store 0.3.0. 0.3.0 completed 150+ issues, which were completed by nearly 30 contributors. Please check out the full documentation for detailed information and user guides. Flink Table Store 0.3 completes many exciting features, enhances its ability as a data lake storage and greatly improves the availability of its stream pipeline. ...

Continue reading »

Apache Flink Kubernetes Operator 1.3.1 Release Announcement

January 10, 2023 - Gyula Fora (@GyulaFora)

The Apache Flink Community is pleased to announce the first bug fix release of the Flink Kubernetes Operator 1.3 series. The release contains fixes for several critical issues and some major stability improvements for the application upgrade mechanism. We highly recommend all users to upgrade to Flink Kubernetes Operator 1.3.1. Release Notes # Bug # [FLINK-30329] - flink-kubernetes-operator helm chart does not work with dynamic config because of use of volumeMount subPath [FLINK-30361] - Cluster deleted and created back while updating replicas [FLINK-30406] - Jobmanager Deployment error without HA metadata should not lead to unrecoverable error [FLINK-30437] - State incompatibility issue might cause state loss [FLINK-30527] - Last-state suspend followed by flinkVersion change may lead to state loss [FLINK-30528] - Job may be stuck in upgrade loop when last-state fallback is disabled and deployment is missing Improvement # [FLINK-28875] - Add FlinkSessionJobControllerTest [FLINK-30408] - Add unit test for HA metadata check logic Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. ...

Continue reading »