Re: Trying to figure out why a slot takes a long time to checkpoint
Posted by
Renjie Liu on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Trying-to-figure-out-why-a-slot-takes-a-long-time-to-checkpoint-tp23108p23120.html
Hi, Julio:
This happens frequently? What state backend do you use? The async checkpoint duration and sync checkpoint duration seems normal compared to others, it seems that most of the time are spent acking the checkpoint.
Hi Julio,
Yes, it seems that fifty-five minutes is really long.
However, it is linear with the time and size of the previous task adjacent to it in the diagram.
I think your real application is concerned about why Flink accesses HDFS so slowly.
You can call the DEBUG log to see if you can find any clues, or post the log to the mailing list to help others analyze the problem for you.
Thanks, vino.
(Just an addendum: Although it's not a huge problem -- we can always increase the checkpoint timeout time -- this anomalous situation makes me think there is something wrong in our pipeline or in our cluster, and that is what is making the checkpoint creation go crazy.)
--
Liu, Renjie
Software Engineer, MVAD