Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic

March 24, 2020 - Alexander Fedulov (@alex_fedulov)

In the first article of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded KeysExtractor implementation. We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. ...

Continue reading »

Apache Beam: How Beam Runs on Top of Flink

February 22, 2020 - Maximilian Michels (@stadtlegende) Markos Sfikas (@MarkSfik)

Note: This blog post is based on the talk “Beam on Flink: How Does It Actually Work?”. Apache Flink and Apache Beam are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. ...

Continue reading »

No Java Required: Configuring Sources and Sinks in SQL

February 20, 2020 - Seth Wiesman (@sjwiesman)

Introduction # The recent Apache Flink 1.10 release includes many exciting features. In particular, it marks the end of the community’s year-long effort to merge in the Blink SQL contribution from Alibaba. The reason the community chose to spend so much time on the contribution is that SQL works. It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts. ...

Continue reading »

Apache Flink 1.10.0 Release Announcement

February 11, 2020 - Marta Paes (@morsapaes)

The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink). Flink 1.10 also marks the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. ...

Continue reading »

A Guide for Unit Testing in Apache Flink

February 3, 2020 - Kartik Khare (@khare_khote)

Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications. ...

Continue reading »

Apache Flink 1.9.2 Released

January 30, 2020 - Hequn Cheng (@HequnC)

The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series. This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements. We highly recommend all users to upgrade to Flink 1.9.2. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.9.2</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

State Unlocked: Interacting with State in Apache Flink

January 29, 2020 - Seth Wiesman (@sjwiesman)

Introduction # With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, Apache Flink is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink. In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink. ...

Continue reading »

Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System

January 15, 2020 - Alexander Fedulov (@alex_fedulov)

In this series of blog posts you will learn about three powerful Flink patterns for building streaming applications: Dynamic updates of application logic Dynamic data partitioning (shuffle), controlled at runtime Low latency alerting based on custom windowing logic (without using the window API) These patterns expand the possibilities of what is achievable with statically defined data flows and provide the building blocks to fulfill complex business requirements. Dynamic updates of application logic allow Flink jobs to change at runtime, without downtime from stopping and resubmitting the code. ...

Continue reading »

Apache Flink 1.8.3 Released

December 11, 2019 - Hequn Cheng

The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series. This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements. We highly recommend all users to upgrade to Flink 1.8.3. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.8.3</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

How to query Pulsar Streams using Apache Flink

November 25, 2019 - Sijie Guo (@sijieg) Markos Sfikas (@MarkSfik)

In a previous story on the Flink blog, we explained the different ways that Apache Flink and Apache Pulsar can integrate to provide elastic data processing at large scale. This blog post discusses the new developments and integrations between the two frameworks and showcases how you can leverage Pulsar’s built-in schema to query Pulsar streams in real time using Apache Flink. A short intro to Apache Pulsar # Apache Pulsar is a flexible pub/sub messaging system, backed by durable log storage. ...

Continue reading »