Hi All, I have been seeing this issue several time where JobGraph are not cleaned up properly. As a result, when Flink cluster is restarted, it will attempt to do HA restore on a checkpoint which doesn't exist anymore and the new restarted cluster eventually go give up and stay down. The workaround is to cleanup the jobgraph manually from Zookeeper. Is this a known issue? 2020-05-19 19:56:21,471 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and sending final execution state FINISHED to JobManager for task Source: kafkaConsumer[update_server] -> (DetectedUpdateMessageConverter -> Sink: update_server.detected_updates, DrivenCoordinatesMessageConverter -> Sink: update_server.driven_coordinates) 588902a8096f49845b09fa1f595d6065. 2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable - Free slot TaskSlot(index:0, state:ACTIVE, resource profile: ResourceProfile{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, networkMemoryInMB=2147483647, managedMemoryInMB=642}, allocationId: 29f6a5f83c832486f2d7ebe5c779fa32, jobId: 86a028b3f7aada8ffe59859ca71d6385). 2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Remove job 86a028b3f7aada8ffe59859ca71d6385 from job leader monitoring. 2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService /leader/86a028b3f7aada8ffe59859ca71d6385/job_manager_lock. 2020-05-19 19:56:21,623 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager connection for job 86a028b3f7aada8ffe59859ca71d6385. 2020-05-19 19:56:21,624 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager connection for job 86a028b3f7aada8ffe59859ca71d6385. 2020-05-19 19:56:21,624 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Cannot reconnect to job 86a028b3f7aada8ffe59859ca71d6385 because it is not registered. ... Zookeeper CLI: ls /flink/cluster_update/jobgraphs
[86a028b3f7aada8ffe59859ca71d6385] Thanks, Fritz |
Forgot to mentioned, Flink version is 1.9.2
|
Free forum by Nabble | Edit this page |