09 Aug 2018 Till Rohrmann (@stsffap)
The Apache Flink community is proud to announce the 1.6.0 release. Over the past 2 months, the Flink community has worked hard to resolve more than 360 issues. Please check the complete changelog for more details.
Flink 1.6.0 is the seventh major release in the 1.x.y series. It is API-compatible with previous 1.x.y releases for APIs annotated with the
You can find the binaries on the updated Downloads page on the Flink project site.
- Flink 1.6 - The next step in stateful stream processing
- New Features and Improvements
- Release Notes
- List of Contributors
Flink 1.6 - The next step in stateful stream processing
In Flink 1.6.0 we continue the groundwork we laid out in earlier versions: Enabling Flink users to seamlessly run fast data processing and build data-driven and data-intensive applications effortlessly.
Flink’s state support is one of the key features which makes Flink so versatile and powerful when it comes to implementing all kinds of use cases. To make it even easier, the community added native support for state TTL (FLINK-9510, FLINK-9938). This feature allows to clean up state after it has expired. With Flink 1.6.0 timer state can now go out of core (FLINK-9485) by storing the relevant state in RocksDB. Last but not least, we also improved the deletion of timers (FLINK-9423) significantly.
With Flink 1.5.0 we reworked Flink’s distributed architecture to add support for resource elasticity and different deployment scenarios, most notably a better container integration. In Flink 1.6.0 we follow up on some of the unfinished aspects of this work: All external communication, including job submissions, is now HTTP/REST based (FLINK-9280) which eases container setups considerably. Flink 1.6.0 also comes with a container entrypoint (FLINK-9488) which allows to easily bootstrap a containerized job cluster.
Streaming SQL is one of the features with the most disruptive potential, because it makes Flink much more accessible. In Apache Flink 1.6.0 the community improved further the SQL CLI (FLINK-8863) making the executions of streaming and batch queries (FLINK-8861) against a multitude of data sources a piece of cake. In addition, the full Avro support (FLINK-9444) makes reading any kind of Avro data seamless. Last but not least, the community hardened Flink’s CEP library (FLINK-9418) that can now handle significantly larger use cases.
What would be a distributed processing engine without its connectors to talk to the outside world? In the latest Flink release we added a new StreamingFileSink (FLINK-9750) that succeeds the
BucketingSinkas the standard file sink. The community also added support for ElasticSearch 6.x (FLINK-7386) and implemented multiple AvroDeserializationSchemas (FLINK-9338) to easily ingest Avro data.
New Features and Improvements
Improving Flink’s State Support
Support for State TTL (FLINK-9510, FLINK-9938): This feature allows to specify a time-to-live (TTL) for Flink state. Once the time-to-live has been exceeded Flink will no longer give access to the respective state values. The expired data is cleaned up on access so that the operator keyed state doesn’t grow infinitely and it won’t be included in subsequent checkpoints. This feature fully complies with new data protection regulations (e.g. GDPR).
Scalable Timers Based on RocksDB (FLINK-9485): Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk. Previously, users were limited to the heap memory size. On top of that, snapshots of the timer state are now asynchronous, i.e., they no longer block the processing pipeline during checkpoints and can be incremental.
Faster Timer Deletions (FLINK-9423): Improving Flink’s internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n). This significantly improves Flink jobs using timers. Deleting timers is also exposed through a user-facing API now.
Extending Flink’s Deployment Options
Job Cluster Container Entrypoint (FLINK-9488): Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster. Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed. Since the image already contains the Flink job, client communication is no longer necessary. Avoiding additional communication steps with the client reduces the number of moving parts and improves operations in a container environment significantly.
Fully RESTified Job Submission (FLINK-9280): The Flink client now sends all job-relevant content via a single POST call to the server. This allows a much easier integration with cluster management frameworks and container environments, since opening custom ports is no longer necessary.
Enhancing SQL and Table API
User-Defined Function in SQL Client CLI (FLINK-8863): The SQL Client CLI now supports the registration of user-defined functions. This considerably improves the CLI’s expressiveness, because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions.
Support for Batch Queries in SQL Client CLI (FLINK-8861): The SQL Client CLI now supports the execution of batch queries.
Support for INSERT INTO Statements in SQL Client CLI (FLINK-8858): By supporting SQL’s INSERT INTO statements, the SQL Client CLI can be used to submit long-running SQL queries to Flink that sink their results in external systems. The SQL Client itself can be shut down after submission without stopping the job.
Unified Table Sinks and Formats (FLINK-8866, FLINK-8558): In the past, table sinks had to be configured programmatically and were tied to a specific format and implementation. This release reworked these aspects by decoupling formats from connectors and improving how table sinks are discovered and configured. Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code.
New Kafka Table Sink (FLINK-9846): The Kafka table sink now uses the new unified APIs and supports both JSON and Avro formats.
Full SQL Avro Support (FLINK-9444): Flink’s Table & SQL API now understands the full spectrum of Avro types including generic/specific records and logical types. The types are automatically mapped from and to Flink-equivalent types allowing to specify end-to-end ETL pipelines in SQL.
Improved Expressiveness of SQL and Table API (FLINK-5878, FLINK-8688, FLINK-6810): Flink’s Table & SQL API supports left, right, and full outer joins that allow for continuous result-updating queries. SQL aggregate functions support the
DISTINCTkeyword. Queries such as
COUNT(DISTINCT column)are supported for windowed and non-windowed aggregations. Both SQL and Table API now include more built-in functions such as
MD5, SHA1, SHA2, LOG, and
New StreamingFileSink (FLINK-9750): The new
StreamingFileSinkis an exactly-once sink for writing to filesystems which capitalizes on the knowledge acquired from the previous
BucketingSink. Exactly-once is supported through integration of the sink with Flink’s checkpointing mechanism. The new sink is built upon Flink’s own
FileSystemabstraction and it supports local file system and HDFS, with plans for S3 support in the near future. It exposes pluggable file rolling and bucketing policies. Apart from row-wise encoding formats, the new
StreamingFileSinkcomes with support for Parquet. Other bulk-encoding formats like ORC can be easily added using the exposed APIs.
ElasticSearch 6.x Connector and Improved Support for Older Versions (FLINK-7386): Flink now comes with a connector for ElasticSearch 6.x, that is built on top of Elasticsearch’s new high level REST client. For older ElasticSearch versions which still use the native Java
TransportClient, Flink’s Elasticsearch connectors now support up to Elasticsearch version 5.6.10. Some APIs in the
RequestIndexer'spublic interface of the ElasticSearch connector have been deprecated. Please refer to the Javadoc / documentation for the new preferred API.
Avro Deserialization Schemas (FLINK-9338): Flink comes now with a
DeserializationSchemawhich allows deserializing Avro encoded messages. It also adds out-of-the-box integration with Confluent’s schema registry.
Jepsen Based Distributed Tests Suite
The Flink community added a Jepsen based test suite (FLINK-9004) which validates the behavior of Flink’s distributed cluster components under real-world faults. It is a first step towards a higher test coverage for Flink’s fault tolerance mechanisms. The community intends to incrementally improve test coverage with it.
Various Other Features and Improvements
Hardened CEP Library (FLINK-9418): The CEP operator’s internal NFA state is now backed by Flink state. That way it can go out of core to support much larger use cases.
More Expressive DataStream Joins (FLINK-8478): Flink 1.6.0 adds support for interval joins in the DataStream API. With this feature it is now possible to join together events from different streams where elements from one stream lie in a specified time interval relative to elements from the other stream. Check out the documentation for more details.
Intra-Cluster Mutual Authentication (FLINK-9312): Flink’s cluster components now enforce mutual authentication with their peers. This allows only Flink components to talk to each other, making it impossible for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication.
Please review the release notes if you plan to upgrade your Flink setup to Flink 1.6.
List of Contributors
According to git shortlog, the following 112 people contributed to the 1.6.0 release. Thanks to all contributors!
Alejandro Alcalde, Alexander Koltsov, Alexey Tsitkin, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Arunan Sugunakumar, Ashwin Sinha, Bill Lee, Bowen Li, Chesnay Schepler, Christophe Jolif, Clément Tamisier, Craig Foster, David Anderson, Dawid Wysakowicz, Deepak Sharnma, Dmitrii_Kniazev, EAlexRojas, Elias Levy, Eron Wright, Ethan Li, Fabian Hueske, Florian Schmidt, Franz Thoma, Gabor Gevay, Georgii Gobozov, Haohui Mai, Jamie Grier, Jeff Zhang, Jelmer Kuperus, Jiayi Liao, Jungtaek Lim, Kailash HD, Ken Geis, Ken Krugler, Lakshmi Gururaja Rao, Leonid Ishimnikov, Matrix42, Michael Gendelman, MichealShin, Moser Thomas W, Nico Duldhardt, Nico Kruber, Oleksandr Nitavskyi, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Philippe Duveau, Piotr Nowojski, Qiu Congxian/klion26, Rinat Sharipov, Rong Rong, Rune Skou Larsen, Sayat Satybaldiyev, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Thomas Weise, Till Rohrmann, Timo Walther, Tobii42, Tzu-Li (Gordon) Tai, Viktor Vlasov, Wosin, Xingcan Cui, Xpray, Yan Zhou, Yazdan.JS, Yun Tang, Zhijiang, Zsolt Donca, an4828, aria, binlijin, blueszheng, davidxdh, gyao, hequn8128, hzyuqi1, jerryjzhang, jparkie, juhoautio, kai-chi, kkloudas, klion26, lamber-ken, lincoln-lil, linjun, liurenjie1024, lsy, maqingxiang-it, maxbelov, mayyamus, minwenjun, neoremind, sampathBhat, shankarganesh1234, shuai.xus, sihuazhou, snuyanzin, triones.deng, vinoyang, xueyu, yangshimin, yuemeng, zhangminglei, zhouhai02, zjureel, 军长, 陈梓立