Re: Trying to figure out why a slot takes a long time to checkpoint

Posted by Julio Biason on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Trying-to-figure-out-why-a-slot-takes-a-long-time-to-checkpoint-tp23108p23109.html

(Just an addendum: Although it's not a huge problem -- we can always increase the checkpoint timeout time -- this anomalous situation makes me think there is something wrong in our pipeline or in our cluster, and that is what is making the checkpoint creation go crazy.)

On Fri, Sep 14, 2018 at 8:00 PM, Julio Biason <[hidden email]> wrote:
Hey guys,

On our pipeline, we have a single slot that it's taking longer to create the checkpoint compared to other slots and we are wondering what could be causing it.

The operator in question is the window metric -- the only element in the pipeline that actually uses the state. While the other slots take 7 mins to create the checkpoint, this one -- and only this one -- takes 55mins.

Is there something I should look at to understand what's going on?

(We are storing all checkpoints in HDFS, in case that helps.)

--
Julio Biason, Sofware Engineer
AZION  |  Deliver. Accelerate. Protect.
Office: <a href="callto:+555130838101" value="+555130838101" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px" target="_blank">+55 51 3083 8101  |  Mobile: <a href="callto:+5551996209291" style="color:rgb(17,85,204)" target="_blank">+55 51 99907 0554



--
Julio Biason, Sofware Engineer
AZION  |  Deliver. Accelerate. Protect.
Office: <a href="callto:+555130838101" value="+555130838101" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px" target="_blank">+55 51 3083 8101  |  Mobile: <a href="callto:+5551996209291" style="color:rgb(17,85,204)" target="_blank">+55 51 99907 0554