DataStream API Batch Execution Mode restarting...

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DataStream API Batch Execution Mode restarting...

Marco Villalobos-2
I have a DataStream running in Batch Execution mode within YARN on EMR.
My job failed an hour into the job two times in a row because the task manager heartbeat timed out.

Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation.

thank you.
Reply | Threaded
Open this post in threaded view
|

Re: DataStream API Batch Execution Mode restarting...

Yun Gao
Hi Marco,

Have you configured the restart strategy ? if the restart-strategy [1] is configuration
into some strategies other than none, Flink should be able to restart the job automatically
on failover. The restart strategy could also be configuration via 
StreamExecutionEnvironment#setRestartStrategy. 

If no restart strategy is configured (the default behavior), the job would failed and we would
need to re-submit the job to execute it from the scratch.

Best,
Yun



------------------Original Mail ------------------
Sender:Marco Villalobos <[hidden email]>
Send Date:Wed May 19 11:27:37 2021
Recipients:user <[hidden email]>
Subject:DataStream API Batch Execution Mode restarting...
I have a DataStream running in Batch Execution mode within YARN on EMR.
My job failed an hour into the job two times in a row because the task manager heartbeat timed out.

Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation.

thank you.
Reply | Threaded
Open this post in threaded view
|

Re: DataStream API Batch Execution Mode restarting...

Marco Villalobos-2
Thank you.  I used the default restart strategy.  I'll change that.

On Tue, May 18, 2021 at 11:02 PM Yun Gao <[hidden email]> wrote:
Hi Marco,

Have you configured the restart strategy ? if the restart-strategy [1] is configuration
into some strategies other than none, Flink should be able to restart the job automatically
on failover. The restart strategy could also be configuration via 
StreamExecutionEnvironment#setRestartStrategy. 

If no restart strategy is configured (the default behavior), the job would failed and we would
need to re-submit the job to execute it from the scratch.

Best,
Yun



------------------Original Mail ------------------
Sender:Marco Villalobos <[hidden email]>
Send Date:Wed May 19 11:27:37 2021
Recipients:user <[hidden email]>
Subject:DataStream API Batch Execution Mode restarting...
I have a DataStream running in Batch Execution mode within YARN on EMR.
My job failed an hour into the job two times in a row because the task manager heartbeat timed out.

Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation.

thank you.