Jobmanager time out / long running batch job

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Jobmanager time out / long running batch job

Jan Oelschlegel

Hi,

 

Im using Flink 1.11.3 and run a batch job. In the log of the jobmanager I see that all operators switched from running to finished. And then there is a timeout of the jobmanager. And after some pause the overall status is switched from running  to finished.

 

Why is there a big gap in between? The task managers have finished their job long ago and only after some time it is accepted as completed at jobmanager. Why is that so?

 

PS: I replaced some sensitive data with ‘*’

 

2021-03-11 09:55:02,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       ****-> Sink: Unnamed (4/4) (5d74d463ecfa2a87128978da5f6fcdbd) switched from RUNNING to FINISHED.

2021-03-11 09:55:02,207 WARN  org.apache.hadoop.hive.conf.HiveConf                         [] - HiveConf of name hive.enforce.sorting does not exist

2021-03-11 09:55:02,207 WARN  org.apache.hadoop.hive.conf.HiveConf                         [] - HiveConf of name hive.enforce.bucketing does not exist

2021-03-11 09:55:02,209 INFO  hive.metastore                                               [] - Trying to connect to metastore with URI thrift://******

2021-03-11 09:55:02,212 INFO  hive.metastore                                               [] - Opened a connection to metastore, current connections: 3

2021-03-11 09:55:02,212 INFO  hive.metastore                                               [] - Connected to metastore.

2021-03-11 09:55:48,817 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 8300d383b32d743e1af2f117e91f2e52 timed out.

2021-03-11 09:55:48,817 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email] for job 85f5e6e0af74b31e359168a25390af92 from the resource manager.

2021-03-11 09:57:08,406 INFO  hive.metastore                                               [] - Closed a connection to metastore, current connections: 2

2021-03-11 09:57:08,406 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job ****** (85f5e6e0af74b31e359168a25390af92) switched from state RUNNING to FINISHED.

 

 

Best,

Jan

 

 

HINWEIS: Dies ist eine vertrauliche Nachricht und nur für den Adressaten bestimmt. Es ist nicht erlaubt, diese Nachricht zu kopieren oder Dritten zugänglich zu machen. Sollten Sie diese Nachricht irrtümlich erhalten haben, bitte ich um Ihre Mitteilung per E-Mail oder unter der oben angegebenen Telefonnummer.