After a flink streaming job has been running for a while (about one day), it will automatically restart.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

After a flink streaming job has been running for a while (about one day), it will automatically restart.

mailtolrl
Hi all,
 I started a flink streaming job and it will always restart automatically after running for a while (about 1 day).

The start job command:flink run -yd -m yarn-cluster -yqu myqueue  -yn 1 -yjm 1024 -ytm 2048 -ys 1 -p 30 myjar.jar someArgs

The restart config is:
 

And the running error message is :
1.
2.
3.

Each job is always automatically restarted because of the above error.

Thanks.

 



 



 



 

Reply | Threaded
Open this post in threaded view
|

Re: After a flink streaming job has been running for a while (about one day), it will automatically restart.

Yun Gao
Hi,

     For the exception of `Connection reset by peer`, it means the connection fails due to received TCP package with RESET flag. There might be two cases:
     1. A TaskManager connected to the one throws this exception has shutdown due to some other exceptions.
     2. The underlying physical network is suffering from package loss. When a single package gets lost multiple times, then a RESET package will be sent by the sender side. It might happen in cases like CPU usage is too high to handle network card interrupts or the underlying physical hardware has problems.

    Therefore, I think you might first check whether TM connecting to this one (it should be reported with the exception and I think you might find it in the original log file) has shutdown when the exception is thrown. If not, then it might need to check if there are package loss when the exception happens. 

Best,
Yun


------------------------------------------------------------------
From:mailtolrl <[hidden email]>
Send Time:2019 Jul. 10 (Wed.) 09:40
To:user <[hidden email]>
Subject:After a flink streaming job has been running for a while (about one day), it will automatically restart.

Hi all,
 I started a flink streaming job and it will always restart automatically after running for a while (about 1 day).

The start job command:flink run -yd -m yarn-cluster -yqu myqueue  -yn 1 -yjm 1024 -ytm 2048 -ys 1 -p 30 myjar.jar someArgs

The restart config is:
 

And the running error message is :
1.
2.
3.

Each job is always automatically restarted because of the above error.

Thanks.