Cluster die when one of the TM killed

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster die when one of the TM killed

Siew Wai Yow

Hi,


When one of the task manager is killed, the whole cluster die, is this something expected? We are using Flink 1.4. Thank you.


Regards,

Yow

Reply | Threaded
Open this post in threaded view
|

Re: Cluster die when one of the TM killed

Dominik Wosiński
Hey, 
Can You please provide a little more information about your setup and maybe logs showing when the crash occurs? 
Best Regards,
Dominik

2018-08-20 16:23 GMT+02:00 Siew Wai Yow <[hidden email]>:

Hi,


When one of the task manager is killed, the whole cluster die, is this something expected? We are using Flink 1.4. Thank you.


Regards,

Yow


Reply | Threaded
Open this post in threaded view
|

Re: Cluster die when one of the TM killed

Lasse Nedergaard
Hi. 
We have seen the same behaviour on Yarn. It turned out that the default settings for was not optimal. 
  • yarn.maximum-failed-containers: The maximum number of failed containers the ApplicationMaster accepts until it fails the YARN session. Default: The number of initially requested TaskManagers (-n).
So try to lookup the configuration for your system. 
Next step is to investigate why the task manager is killed. 


Med venlig hilsen / Best regards
Lasse Nedergaard


Den 20. aug. 2018 kl. 16.34 skrev Dominik Wosiński <[hidden email]>:

Hey, 
Can You please provide a little more information about your setup and maybe logs showing when the crash occurs? 
Best Regards,
Dominik

2018-08-20 16:23 GMT+02:00 Siew Wai Yow <[hidden email]>:

Hi,


When one of the task manager is killed, the whole cluster die, is this something expected? We are using Flink 1.4. Thank you.


Regards,

Yow