What are the general reasons for a Flink Task Manager to crash? How to troubleshoot?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

What are the general reasons for a Flink Task Manager to crash? How to troubleshoot?

HarshithBolar
We're running Flink on a 5 node Flink cluster with two Job Managers and three
Task Managers.

Of late, we're facing this issue where once every day or so, all three task
managers get killed, making the number of available task slots 0 causing all
the jobs running on that cluster to fail. The only resolution is to manually
restart the Task Managers.

So I wanted to know some of the typical reason that can bring down a Task
Manager. And if there is a way to automatically bring them back up without
manual intervention.

Additional info: The jobs running on the cluster read data from Kafka and
write data to Kafka/Cassandra.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/