Preamble: This roadmap means to provide user and contributors with a high-level summary of ongoing efforts, grouped by the major threads to which the efforts belong. With so much that is happening in Flink, we hope that this helps with understanding the direction of the project. The roadmap contains both efforts in early stages as well as nearly completed efforts, so that users may get a better impression of the overall status and direction of those developments.
More details and various smaller changes can be found in the FLIPs
The roadmap is continuously updated. New features and efforts should be added to the roadmap once there is consensus that they will happen and what they will roughly look like for the user.
Last Update: 2021-09-16
The feature radar is meant to give users guidance regarding feature maturity, as well as which features are approaching end-of-life. For questions, please contact the developer mailing list: firstname.lastname@example.org
Flink is a streaming data system in its core, that executes “batch as a special case of streaming”. Efficient execution of batch jobs is powerful in its own right; but even more so, batch processing capabilities (efficient processing of bounded streams) open the way for a seamless unification of batch and streaming applications.
Unified streaming/batch up-levels the streaming data paradigm: It gives users consistent semantics across their real-time and lag-time applications. Furthermore, streaming applications often need to be complemented by batch (bounded stream) processing, for example when reprocessing data after bugs or data quality issues, or when bootstrapping new applications. A unified API and system make this much easier.
The community has been building Flink to a powerful basis for a unified (batch and streaming) SQL analytics platform, and is continuing to do so.
SQL has very strong cross-batch-streaming semantics, allowing users to use the same queries for ad-hoc analytics and as continuous queries. Flink already contains an efficient unified query engine, and a wide set of integrations. With user feedback, those are continuously improved.
More Connector and Change Data Capture Support
Support for Common Languages, Formats, Catalogs
Flink has a broad SQL coverage for batch (full TPC-DS support) and a state-of-the-art set of supported operations in streaming. There is continuous effort to add more functions and cover more SQL operations.
The DataStream API is Flink’s physical API, for use cases where users need very explicit control over data types, streams, state, and time. This API is evolving to support efficient batch execution on bounded data.
DataStream API executes the same dataflow shape in batch as in streaming, keeping the same operators. That way users keep the same level of control over the dataflow, and our goal is to mix and switch between batch/streaming execution in the future to make it a seamless experience.
Unified Sources and Sinks
In this effort, we are creating sources that work across batch and streaming execution. The aim is to give users a consistent experience across both modes, and to allow them to easily switch between streaming and batch execution for their unbounded and bounded streaming applications. The interface for this New Source API is done and available, and we are working on migrating more source connectors to this new model, see FLIP-27.
We have introduced a new API for sinks that consistently handles result writing and committing (Transactions) across batch and streaming. The first iteration of the API exists, and we are porting sinks and refining the API in the process. See FLIP-143.
DataStream Batch Execution
Flink is adding a batch execution mode for bounded DataStream programs. This gives users faster and simpler execution and recovery of their bounded streaming applications; users do not need to worry about watermarks and state sizes in this execution mode: FLIP-140
The core batch execution mode is implemented with great results; there are ongoing improvements around aspects like broadcast state and processing-time-timers. This mode requires the new unified sources and sinks that are mentioned above, so it is limited to the connectors that have been ported to those new APIs.
Mixing bounded/unbounded streams, and batch/streaming execution
Support checkpointing when some tasks finished & Bounded stream programs shut down with a final checkpoint: FLIP-147
There are initial discussions and designs about jobs with mixed batch/streaming execution, so stay tuned for more news in that area.
We want to eventually drop the legacy Batch-only DataSet API, have batch-and stream processing unified throughout the entire system.
Overall Discussion: FLIP-131
The DataStream API supports batch-execution to efficiently execute streaming programs on historic data (see above). Takes over that set of use cases.
The Table API should become the default API for batch-only applications.
Improve the interplay between the Table API and the DataStream API to allow switching from Table API to DataStream API when more control over the data types and operations is necessary.
The goal of these efforts is to make it feel natural to deploy (long running streaming) Flink applications. Instead of starting a cluster and submitting a job to that cluster, these efforts support deploying a streaming job as a self contained application.
For example as a simple Kubernetes deployment; deployed and scaled like a regular application without extra workflows.
Deploy Flink jobs as self-contained Applications works for all deployment targets since Flink 1.11.0 (FLIP-85).
Reactive Scaling lets Flink applications change their parallelism in response to growing and shrinking worker pools, and makes Flink compatibel with standard auto-scalers: FLIP-159
Kubernetes-based HA-services let Flink applications run on Kubernetes without requiring a ZooKeeper dependency: FLIP-144
Continuous work to keep improving performance and recovery speed.
The community is continuously working on improving checkpointing and recovery speed. Checkpoints and recovery are stable and have been a reliable workhorse for years. We are still trying to make it faster, more predictable, and to remove some confusions and inflexibility in some areas.
The community is working on making large scale batch execution (parallelism in the order of 10,000s) simpler (less configuration tuning required) and more performant.
Introduce a more scalable batch shuffle. First parts of this have been merged, and ongoing efforts are to make the memory footprint (JVM direct memory) more predictable, see FLIP-148
Make scheduler faster for higher parallelism: FLINK-21110
Most functionalities in the Java Table APIs and DataStream APIs are already supported by the Python APIs. The community is continuously working on improvements such as improving the checkpoint strategy for Python UDF execution (FLINK-18235), introducing more connectors support in both the Python DataStream API and Python Table API so that the Python API can be used in for production implementations.
Stateful transformation functions for the Python DataStream API: FLIP-153
There are various dedicated efforts to simplify the maintenance and structure (more intuitive navigation/reading) of the documentation.
The Stateful Functions subproject has its own roadmap published under statefun.io.