Flink Community Update - May'20

May 6, 2020 - Marta Paes (@morsapaes)

Can you smell it? It’s release month! It took a while, but now that we’re all caught up with the past, the Community Update is here to stay. This time around, we’re warming up for Flink 1.11 and peeping back to the month of April in the Flink community — with the release of Stateful Functions 2.0, a new self-paced Flink training and some efforts to improve the Flink documentation experience. ...

Continue reading »

Applying to Google Season of Docs 2020

May 4, 2020 - Marta Paes (@morsapaes)

The Flink community is thrilled to share that the project is applying again to Google Season of Docs (GSoD) this year! If you’re unfamiliar with the program, GSoD is a great initiative organized by Google Open Source to pair technical writers with mentors to work on documentation for open source projects. The first edition supported over 40 projects, including some other cool Apache Software Foundation (ASF) members like Apache Airflow and Apache Cassandra. ...

Continue reading »

Apache Flink 1.9.3 Released

April 24, 2020 - Dian Fu (@DianFu11)

The Apache Flink community released the third bugfix version of the Apache Flink 1.9 series. This release includes 38 fixes and minor improvements for Flink 1.9.2. The list below includes a detailed list of all fixes and improvements. We highly recommend all users to upgrade to Flink 1.9.3. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.9.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.9.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.9.3</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

Memory Management Improvements with Apache Flink 1.10

April 21, 2020 - Andrey Zagrebin

Apache Flink 1.10 comes with significant changes to the memory model of the Task Managers and configuration options for your Flink applications. These recently-introduced changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), providing strict control over its memory consumption. In this post, we describe Flink’s memory model, as it stands in Flink 1.10, how to set up and manage memory consumption of your Flink applications and the recent changes the community implemented in the latest Apache Flink release. ...

Continue reading »

Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can

April 15, 2020 - Nico Kruber

Almost every Flink job has to exchange data between its operators and since these records may not only be sent to another instance in the same JVM but instead to a separate process, records need to be serialized to bytes first. Similarly, Flink’s off-heap state-backend is based on a local embedded RocksDB instance which is implemented in native C++ code and thus also needs transformation into bytes on every state access. ...

Continue reading »

PyFlink: Introducing Python Support for UDFs in Flink's Table API

April 9, 2020 - Jincheng Sun (@sunjincheng121) Markos Sfikas (@MarkSfik)

Flink 1.9 introduced the Python Table API, allowing developers and data engineers to write Python Table API jobs for Table transformations and analysis, such as Python ETL or aggregate jobs. However, Python users faced some limitations when it came to support for Python UDFs in Flink 1.9, preventing them from extending the system’s built-in functionality. In Flink 1.10, the community further extended the support for Python by adding Python UDFs in PyFlink. ...

Continue reading »

Stateful Functions 2.0 - An Event-driven Database on Apache Flink

April 7, 2020 - Stephan Ewen (@stephanewen)

Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project. This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the first version of an event-driven database that is built on Apache Flink. Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes. ...

Continue reading »

Flink Community Update - April'20

March 30, 2020 - Marta Paes (@morsapaes)

While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog. And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). ...

Continue reading »

Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration

March 27, 2020 - Bowen Li (@Bowen__Li)

In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. ...

Continue reading »

Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic

March 24, 2020 - Alexander Fedulov (@alex_fedulov)

In the first article of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded KeysExtractor implementation. We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. ...

Continue reading »