HA stand alone cluster error

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HA stand alone cluster error

miki haiat
i had some catastrofic eroror  

 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: Failed to recover job a048ad572c9837a400eca20cd55241b6.
File does not exist: /flink_1.5/ha/beam1/blob/job_a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d


I was unable to start the cluster again ,
I  removed all the data from Hdoop and clean Zookeeper  in order to be able to start the cluster again.

But now i have this error 

2018-05-29 03:51:54,082 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Could not recover job graph for job e3369e6dce5305b9411b4695975eea26.
org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.

how can i clean the state and bring back the cluster ...

Thanks,

Miki

 
Reply | Threaded
Open this post in threaded view
|

Re: HA stand alone cluster error

Gary Yao-2
Hi Miki,

Sorry for the late reply. If you are able to reproduce the first problem, it
would be good to see the complete JobManager logs.

The second exception indicates that you have not removed all data from
ZooKeeper. On recovery, Flink looks up the locations of the submitted JobGraphs
in ZooKeeper. You can check for yourself which jobs will be recovered by
checking the contents of znode /flink/<namespace>/jobgraphs.

Best,
Gary

On Tue, May 29, 2018 at 9:56 AM, miki haiat <[hidden email]> wrote:
i had some catastrofic eroror  

 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: Failed to recover job a048ad572c9837a400eca20cd55241b6.
File does not exist: /flink_1.5/ha/beam1/blob/job_a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d


I was unable to start the cluster again ,
I  removed all the data from Hdoop and clean Zookeeper  in order to be able to start the cluster again.

But now i have this error 

2018-05-29 03:51:54,082 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Could not recover job graph for job e3369e6dce5305b9411b4695975eea26.
org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.

how can i clean the state and bring back the cluster ...

Thanks,

Miki