Announcing the Release of Apache Flink 1.15
May 5, 2022 - Joe Moser (@JoemoeAT) Yun Gao (@YunGao16)Thanks to our well-organized and open community, Apache Flink continues to grow as a technology and remain one of the most active projects in the Apache community. With the release of Flink 1.15, we are proud to announce a number of exciting changes.
One of the main concepts that makes Apache Flink stand out is the unification of batch (aka bounded) and stream (aka unbounded) data processing, which helps reduce the complexity of development. A lot of effort went into this unification in the previous releases, and you can expect more efforts in this direction.
Apache Flink is not only growing when it comes to contributions and users, but also out of the original use cases. We are seeing a trend towards more business/analytics use cases implemented in low-/no-code. Flink SQL is the feature in the Flink ecosystem that enables such uses cases and this is why its popularity continues to grow.
Apache Flink is an essential building block in data pipelines/architectures and is used with many other technologies in order to drive all sorts of use cases. While new ideas/products may appear in this domain, existing technologies continue to establish themselves as standards for solving mission-critical problems. Knowing that we have such a wide reach and play a role in the success of many projects, it is important that the experience of integrating Apache Flink with the cloud infrastructures and existing systems is as seamless and easy as possible.
In the 1.15 release the Apache Flink community made significant progress across all these areas. Still those are not the only things that made it into 1.15. The contributors improved the experience of operating Apache Flink by making it much easier and more transparent to handle checkpoints and savepoints and their ownership, making auto scaling more seamless and complete, by removing side effects of use cases in which different data sources produce varying amounts of data, and - finally - the ability to upgrade SQL jobs without losing the state. By continuing on supporting checkpoints after tasks finished and adding window table valued functions in batch mode, the experience of unified stream and batch processing was once more improved making hybrid use cases way easier. In the SQL space, not only the first step in version upgrades have been added but also JSON functions to make it easier to import and export structured data in SQL. Both will allow users to better rely on Flink SQL for production use cases in the long term. To establish Apache Flink as part of the data processing ecosystem we improved the cloud interoperability and added more sink connectors and formats. And yes we enabled a Scala-free runtime (the hype is real).
Operating Apache Flink with ease #
Even Flink jobs that have been built and tuned by the best engineering teams still need to be operated, usually on a long-term basis. The many deployment patterns, APIs, tunable configs, and use cases covered by Apache Flink mean that operation support is vital and can be burdensome.
In this release, we listened to user feedback and now operating Flink is made much
easier. It is now more transparent in terms of handling checkpoints and savepoints and their ownership,
which makes auto-scaling more seamless and complete (by removing side effects of use cases
where different data sources produce varying amounts of data) and enables the
ability to upgrade SQL jobs without losing the state.
Clarification of checkpoint and savepoint semantics #
An essential cornerstone of Flink’s fault tolerance strategy is based on checkpoints and savepoints (see the comparison). The purpose of savepoints has always been to put transitions, backups, and upgrades of Flink jobs in the control of users. Checkpoints, on the other hand, are intended to be fully controlled by Flink and guarantee fault tolerance through fast recovery, failover, etc. Both concepts are quite similar, and the underlying implementation also shares aspects of the same ideas.
However, both concepts grew apart by following specific feature requests and sometimes neglecting the overarching idea and strategy. Based on user feedback, it became apparent that this should be aligned and harmonized better and, above all, to make more clear!
There have been situations in which users relied on checkpoints to stop/restart jobs when savepoints would have been the right way to go. It was also not clear that savepoints are slower since they don’t include some of the features that make taking checkpoints so fast. In some cases like resuming from a retained checkpoint - where the checkpoint is somehow considered as a savepoint - it is unclear to the user when they can actually clean it up.
With FLIP-193 (Snapshots ownership) the community aims to make ownership the only difference between savepoints and checkpoints. In the 1.15 release the community has fixed some of those shortcomings by supporting native and incremental savepoints. Savepoints always used to use the canonical format which made them slower. Also writing full savepoints for sure takes longer than doing it in an incremental way. With 1.15 if users use the native format to take savepoints as well as the RocksDB state backend, savepoints will be automatically taken in an incremental manner. The documentation has also been clarified to provide a better overview and understanding of the differences between checkpoints and savepoints. The semantics for resuming from savepoint/retained checkpoint have also been clarified introducing the CLAIM and NO_CLAIM mode. With the CLAIM mode Flink takes over ownership of an existing snapshot, with NO_CLAIM it creates its own copy and leaves the existing one up to the user. Please note that NO_CLAIM mode is the new default behavior. The old semantic of resuming from savepoint/retained checkpoint is still accessible but has to be manually selected by choosing LEGACY mode.
Elastic scaling with reactive mode and the adaptive scheduler #
Driven by the increasing number of cloud services built on top of Apache Flink, the project is becoming more and more cloud native which makes elastic scaling even more important.
This release improves metrics for the reactive mode, which is a job-scope mode where the JobManager will try to use all TaskManager resources available. To do this, we made all the metrics in the Job scope work correctly when reactive mode is enabled.
We also added an exception history for the adaptive scheduler, which is a new scheduler that first declares the required resources and waits for them before deciding on the parallelism with which to execute a job.
Furthermore, downscaling is sped up significantly. The TaskManager now has a dedicated shutdown code path, where it actively deregisters itself from the cluster instead of relying on heartbeats, giving the JobManager a clear signal for downscaling.
Adaptive batch scheduler #
In 1.15, we introduced a new scheduler to Apache Flink: the Adaptive Batch Scheduler. The new scheduler can automatically decide parallelisms of job vertices for batch jobs, according to the size of data volume each vertex needs to process.
Major benefits of this scheduler includes:
- Ease-of-use: Batch job users can be relieved from parallelism tuning.
- Adaptive: Automatically tuned parallelisms can better fit consumed datasets which have a varying volume size every day.
- Fine-grained: Parallelism of each job vertex will be tuned individually. This allows vertices of SQL batch jobs to be automatically assigned different proper parallelisms.
Watermark alignment across data sources #
Having data sources that increase watermarks at different paces could lead to problems with downstream operators. For example, some operators might need to buffer excessive amounts of data which could lead to huge operator states. This is why we introduced watermark alignment in this release.
For sources based on the new source interface, watermark alignment can be activated. Users can define alignment groups to pause consuming from sources which are too far ahead from others. The ideal scenario for aligned watermarks is when there are two or more sources that produce watermarks at a different speed and when the source has the same parallelism as splits/shards/partitions.
SQL version upgrades #
The execution plan of SQL queries and its resulting topology is based on optimization rules and a cost model. This means that even minimal changes could introduce a completely different topology. This dynamism makes guaranteeing snapshot compatibility very challenging across different Flink versions. In the efforts of 1.15, the community focused on keeping the same query (via the same topology) up and running even after upgrades.
At the core of SQL upgrades are JSON plans (please note that we only have documentation in our JavaDocs for now and are still working on updating the documentation), which are JSON functions that make it easier to import and export structured data in SQL. This has been introduced for internal use already in previous releases and will now be exposed externally. Both the Table API and SQL will provide a way to compile and execute a plan which guarantees the same topology throughout different versions. This feature will be released as an experimental MVP. Users who want to give it a try already can create a JSON plan that can then be used to restore a Flink job based on the old operator structure. The full feature can be expected in Flink 1.16.
Reliable upgrades makes Flink SQL more dependable for production use cases in the long term.
Changelog state backend #
In Flink 1.15, we introduced the MVP feature of the changelog state backend, which aims at making checkpoint intervals shorter and more predictable with the following advantages:
- Shorter end-to-end latency: end-to-end latency mostly depends on the checkpointing mechanism, especially for transactional sinks. Transactional sinks commit on checkpoints, so faster checkpoints mean more frequent commits.
- More predictable checkpoint intervals: currently checkpointing time largely depends on the size of the artifacts that need to be persisted on the checkpoint storage. By keeping the size consistently small, checkpointing time becomes more predictable.
- Less work on recovery: with more frequently checkpoints are taken, less data need to be re-processed after each recovery.
The changelog state backend helps achieve the above by continuously persisting state changes on non-volatile storage while performing state materialization in the background.
Repeatable cleanup #
In previous releases of Flink, cleaning up job-related artifacts was done only once which might have resulted in abandoned artifacts in case of an error. In this version, Flink will try to run the cleanup again to avoid leaving artifacts behind. This retry mechanism runs until it was successful, by default. Users can change this behavior by configuring the repeatable cleanup options. Disabling the retry strategy will lead to Flink behaving like in previous releases.
There is still work in progress around cleaning up checkpoints, which is covered by FLINK-26606.
OpenAPI #
Flink is now providing an experimental REST API specification following the OpenAPI standard. This allows the REST API to be used with standard tools that are implementing the OpenAPI standard. You can find the specification here.
Improvements to application mode #
When running Flink in application mode, it can now be guaranteed that jobs will take a savepoint after they are completed if they have been configured to do so (see execution.shutdown-on-application-finish).
The recovery and clean up of jobs running in application mode has also been improved. The local state can be persisted in the working directory, which makes recovering from local storage easier.
Unification of stream and batch processing - more progress #
In the latest release, we picked up new efforts and continued some previous ones in the goal of unifying stream and batch processing.
Final checkpoints #
In Flink 1.14, final checkpoints were added as a feature that had to be enabled manually. Since the last release, we listened to user feedback and decided to enable it by default. For more information and how to disable this feature, please refer to the documentation. This change in configuration can prolong the shutting down sequence of bounded streaming jobs because jobs have to wait for a final checkpoint before being allowed to finish.
Window table-valued functions #
Window table-valued functions have only been available for unbounded data streams. With this release they will also be usable in BATCH mode. While working on this, change window table-valued functions have also been improved in general by implementing a dedicated operator which no longer requires those window functions to be used with aggregators.
Flink SQL #
Community metrics indicate that Flink SQL is widely used and becomes more popular every day. The community made several improvements but we’d like to go into two in more detail.
CAST/Type system enhancements #
Data appears in all sorts and shapes but is often not in the type that you need it to be, which is why casting is one of the most common operations in SQL. In Flink 1.15, the default behavior of a failing CAST has changed from returning a null to returning an error, which makes it more compliant with the SQL standard. The old casting behavior can still be used by calling the newly introduced TRY_CAST function or restored via a configuration flag.
In addition, many bugs have been fixed and improvements made to the casting functionality, to ensure correct results.
JSON functions #
JSON is one of the most popular data formats and SQL users increasingly need to build and read these data structures. Multiple JSON functions have been added to Flink SQL according to the SQL 2016 standard. It allows users to inspect, create, and modify JSON strings using the Flink SQL dialect.
Community enablement #
Enabling people to build streaming data pipelines to solve their use cases is our goal. The community is well aware that a technology like Apache Flink is never used on its own and will always be part of a bigger architecture. Thus, it is important that Flink operates well in the cloud, connects seamlessly to other systems, and continues to support programming languages like Java and Python.
Cloud interoperability #
There are users operating Flink deployments in cloud infrastructures from various cloud providers. There are also services that offer to manage Flink deployments for users on their platform.
In Flink 1.15, a recoverable writer for Google Cloud Storage has been added. We also organized the connectors in the Flink ecosystem and put some focus on connectors for the AWS ecosystem (i.e. KDS, Firehose).
The Elasticsearch sink #
There was significant work on Flink’s overall connector ecosystem, but we want to highlight the Elasticsearch sink because it was implemented with the new connector interfaces, which offers asynchronous functionality coupled with end-to-end semantics. This sink will act as a template in the future.
A Scala-free Flink #
A detailed
blog post
already explains the ins and outs of why Scala users can now use the Flink
Java API with any Scala version (including Scala 3).
In the end, removing Scala is just part of a larger effort of cleaning up and updating various technologies from the Flink ecosystem.
Starting in Flink 1.14, we removed the Mesos integration, isolated Akka, deprecated the DataSet Java API, and hid the Table API behind an abstraction. There’s already a lot of traction in the community towards these endeavors.
PyFlink #
Before Flink 1.15, Python user-defined functions were executed in separate Python processes which caused additional serialization/deserialization and communication overhead. In scenarios in with large amounts of data, e.g. image processing, etc, this overhead becomes non-negligible. Besides, since it involves inter-process communication, the processing latency is also non-negligible, which is unacceptable in scenarios for which latency is critical, e.g. quantitative transaction, etc. In Flink 1.15, we have introduced a new execution mode named ’thread’ mode, for which Python user-defined functions will be executed in the JVM as a thread instead of a separate Python process. Benchmarks have shown that throughput could be increased by 2x in common scenarios such as JSON processing. Processing latency is also decreased from several seconds to micro-seconds. It should be noted that since this is still the first release of ’thread’ mode, it currently only supports Python ScalarFunction which is used in Python Table API & SQL. We’re planning to extend it to other areas in which Python user-defined functions could be used in the next releases.
Other #
Further work has been done on the connector testing framework. If you want to contribute a connector or improve on one, you should definitely have a look.
Some long-awaited features have been added, including the CSV format and the small file compaction in the unified sink interface.
The sink API has been upgraded to version 2 and we encourage every connector maintainer to upgrade to this version.
Summary #
Apache Flink is now easier to operate, made even more progress towards aligning stream and batch processing, became more accessible through improvements in the SQL components, and now integrates better with other technologies.
It is also worth mentioning that the community has set up a new home for the CDC connectors, the connector repository will be externalized (with the Elasticsearch sink as a first example), and there is now a Kubernetes operator (announcement blogpost maintained by the community.
Moving forward, the community will continue to focus on making Apache Flink a true unified stream and batch processor and work on better integrating Flink into the cloud-native ecosystem.
Upgrade Notes #
While we aim to make upgrades as smooth as possible, some of the changes require users to adjust some parts of the program when upgrading to Apache Flink 1.15. Please take a look at the release notes for a list of applicable adjustments and issues during upgrades. The one big thing worth mentioning when upgrading is the updated dependencies without the Scala version. Get the details here.
List of Contributors #
The Apache Flink community would like to thank each and every one of the contributors that have made this release possible:
Ada Wong, Ahmed Hamdy, Aitozi, Alexander Fedulov, Alexander Preuß, Alexander Trushev, Ali Bahadir Zeybek, Anton Kalashnikov, Arvid Heise, Bernard Joseph Jean Bruno, Bo Cui, Brian Zhou, Camile, ChangLi, Chengkai Yang, Chesnay Schepler, Daisy T, Danny Cranmer, David Anderson, David Moravek, David N Perkins, Dawid Wysakowicz, Denis-Cosmin Nutiu, Dian Fu, Dong Lin, Eelis Kostiainen, Etienne Chauchot, Fabian Paul, Francesco Guardiani, Gabor Somogyi, Galen Warren, Gao Yun, Gen Luo, GitHub, Gyula Fora, Hang Ruan, Hangxiang Yu, Honnix, Horace Lee, Ingo Bürk, JIN FENG, Jack, Jane Chan, Jark Wu, JianZhangYang, Jiangjie (Becket) Qin, JianzhangYang, Jiayi Liao, Jing, Jing Ge, Jing Zhang, Jingsong Lee, JingsongLi, Jinzhong Li, Joao Boto, Joey Lee, John Karp, Jon Gillham, Jun Qin, Junfan Zhang, Juntao Hu, Kexin, Kexin Hui, Kirill Listopad, Konstantin Knauf, LB-Yu, Leonard Xu, Lijie Wang, Liu Jiangang, Maciej Bryński, Marios Trivyzas, MartijnVisser, Mason Chen, Matthias Pohl, Michal Ciesielczyk, Mika, Mika Naylor, Mrart, Mulavar, Nick Burkard, Nico Kruber, Nicolas Raga, Nicolaus Weidner, Niklas Semmler, Nikolay, Nuno Afonso, Oleg Smirnov, Paul Lin, Paul Zhang, PengFei Li, Piotr Nowojski, Px, Qingsheng Ren, Robert Metzger, Roc Marshal, Roman, Roman Khachatryan, Ruanshubin, Rudi Kershaw, Rui Li, Ryan Scudellari, Ryan Skraba, Sebastian Mattheis, Sergey, Sergey Nuyanzin, Shen Zhu, Shengkai, Shuo Cheng, Sike Bai, SteNicholas, Steffen Hausmann, Stephan Ewen, Tartarus0zm, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, Victor Xu, Wenhao Ji, X-czh, Xianxun Ye, Xin Yu, Xinbin Huang, Xintong Song, Xuannan, Yang Wang, Yangze Guo, Yao Zhang, Yi Tang, Yibo Wen, Yuan Mei, Yuanhao Tian, Yubin Li, Yuepeng Pan, Yufan Sheng, Yufei Zhang, Yuhao Bi, Yun Gao, Yun Tang, Yuval Itzchakov, Yuxin Tan, Zakelly, Zhu Zhu, Zichen Liu, Zongwen Li, atptour2017, baisike, bgeng777, camilesing, chenxyz707, chenzihao, chuixue, dengziming, dijkwxyz, fanrui, fengli, fenyi, fornaix, gaurav726, godfrey he, godfreyhe, gongzhongqiang, haochenhao, hapihu, hehuiyuan, hongshuboy, huangxingbo, huweihua, iyupeng, jiaoqingbo, jinfeng, jxjgsylsg, kevin.cyj, kylewang, lbb, liliwei, liming.1018, lincoln lee, liufangqi, liujiangang, liushouwei, liuyongvs, lixiaobao14, lmagic233, lovewin99, lujiefsi, luoyuxia, lz, mans2singh, martijnvisser, mayue.fight, nanmu42, oogetyboogety, paul8263, pusheng.li01, qianchutao, realdengziqi, ruanhang1993, sammieliu, shammon, shihong90, shitou, shouweikun, shouzuo1, shuo.cs, siavash119, simenliuxing, sjwiesman, slankka, slinkydeveloper, snailHumming, snuyanzin, sujun, sujun1, syhily, tsreaper, txdong-sz, unknown, vahmed-hamdy, wangfeifan, wangpengcheng, wangyang0918, wangzhiwu, wangzhuo, wgzhao, wsz94, xiangqiao123, xmarker, xuyang, xuyu, xuzifu666, yangjunhan, yangze.gyz, ysymi, yuxia Luo, zhang chaoming, zhangchaoming, zhangjiaogg, zhangjingcun, zhangjun02, zhangmang, zlzhang0122, zoucao, zp, zzccctv, 周平, 子扬, 李锐, 蒋龙, 龙三, 庄天翼