Re: latency related to the checkpointing mode EXACTLY ONCE

Posted by Arvid Heise-4 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/latency-related-to-the-checkpointing-mode-EXACTLY-ONCE-tp41555p41672.html

When Flink fails and restarts, it goes back in time to reprocess the data of the latest checkpoint. That's why it also deleted all uncommitted data on restart or else you would receive duplicates in your output.
Hence, to get exactly once, you cannot read uncommitted data. That is true for all streaming systems and sinks that depend on transactions.

In general, low latency and exactly once are contradicting each other a bit. In Flink, you can only get it in a meaningful way if your checkpointing interval is very low, which is currently only possible if your state is very small (no big join windows for example). We are working on improving that limitation though.

One solution if you need low latency is to drop exactly once and deduplicate events in your downstream application.
On Fri, Feb 19, 2021 at 9:55 AM Tan, Min <[hidden email]> wrote:

Many thanks for your quick response.

 

The config read_commit for the kafka consumers is required by the exactly once (EOS)?

No exactly once if we read un committed messages?

 

Regards,

Min

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, February 18, 2021 8:27 PM
To: Tan, Min <[hidden email]>; user <[hidden email]>
Subject: [External] Re: latency related to the checkpointing mode EXACTLY ONCE

 

Yes, if you are only reading committed data than it will take least the checkpoint interval for the data to be available to downstream consumers.

 

On 2/18/2021 6:17 PM, Tan, Min wrote:

Hi,

 

We use the checkpointing mode EXACTLY ONCE for some of our flink jobs.

 

I wonder how the checkpoint configurations specially its checkpoint interval are related to the end to end latency.

 

We need to setup read_commit true for the kafak consumers.

 

Does this lead a latency from one flink job is greater than that of checkpoint interval?

 

Thank you very much for your help in advance.

 

Min