Re: Dynamic configuration of Flink checkpoint interval

Posted by Senhong Liu on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Dynamic-configuration-of-Flink-checkpoint-interval-tp44059p44065.html

Hi all,

In fact, a pretty similar JIRA has been created, which is https://issues.apache.org/jira/browse/FLINK-18578 and I am working on it. In the near future, I will publish a FLIP and start a discussion about that. We look forward to your participation.

Best,
Senhong Liu

JING ZHANG <[hidden email]> 于2021年5月31日周一 上午10:21写道:
Hi Kai,

Happy to hear that. 
Would you please paste the JIRA link in the email after you create it. Maybe it could help other users who encounter the same problem. Thanks very much.

Best regards,
JING ZHANG

Kai Fu <[hidden email]> 于2021年5月30日周日 下午11:19写道:
Hi Jing,

Yup, what you're describing is what I want. I also tried the approach you suggested and it works. I'm going to take that approach for the moment and create a Jira issue for this feature.

On Sun, May 30, 2021 at 8:57 PM JING ZHANG <[hidden email]> wrote:
Hi Kai,

Do you try to find a way to hot update checkpoint interval or disable/enable checkpoint without stop and restart job?
Unfortunately, it is not supported yet, AFAIK. 
You're very welcome to create an issue and describe your needs here (Flink’s Jira) .
At present, you may would like to use the following temporary solution:
  1. set a bigger value as checkpoint interval, start your job
  2. do a savepoint after cold start is completed
  3. set a normal value as checkpoint interval, restart the job from savepoint

Best regards,
JING ZHANG

Kai Fu <[hidden email]> 于2021年5月30日周日 下午7:13写道:
Hi team,

We want to know if Flink has some dynamic configuration of the checkpoint interval. Our use case has a cold start phase where the entire dataset is replayed from the beginning until the most recent ones.

In the cold start phase, the resources are fully utilized and the backpressure is high for all upstream operators, causing the checkpoint timeout constantly. The real production traffic is far less than that and the current provisioned resource is capable of handling it. 

We're thinking if Flink can support the dynamic checkpoint config to bypass the checkpoint operation or make it less frequent on the cold start phase to speed up the process, while making the checkpoint normal again once the cold start is completed.

--
Best wishes,
- Kai


--
Best wishes,
- Kai