(DEPRECATED) Apache Flink User Mailing List archive.

HA stand alone cluster error

Classic

List

Threaded

2 messages Options

miki haiat

HA stand alone cluster error

i had some catastrofic eroror

ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: Failed to recover job a048ad572c9837a400eca20cd55241b6.
File does not exist: /flink_1.5/ha/beam1/blob/job_a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d

I was unable to start the cluster again ,
I removed all the data from Hdoop and clean Zookeeper in order to be able to start the cluster again.

But now i have this error

2018-05-29 03:51:54,082 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Could not recover job graph for job e3369e6dce5305b9411b4695975eea26.
org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.

how can i clean the state and bring back the cluster ...

Thanks,

Miki

Gary Yao-2

Re: HA stand alone cluster error

Hi Miki,

Sorry for the late reply. If you are able to reproduce the first problem, it
would be good to see the complete JobManager logs.

The second exception indicates that you have not removed all data from
ZooKeeper. On recovery, Flink looks up the locations of the submitted JobGraphs
in ZooKeeper. You can check for yourself which jobs will be recovered by
checking the contents of znode /flink/<namespace>/jobgraphs.

Best,
Gary

On Tue, May 29, 2018 at 9:56 AM, miki haiat <[hidden email]> wrote:

i had some catastrofic eroror

ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: Failed to recover job a048ad572c9837a400eca20cd55241b6.
File does not exist: /flink_1.5/ha/beam1/blob/job_a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d

I was unable to start the cluster again ,
I removed all the data from Hdoop and clean Zookeeper in order to be able to start the cluster again.

But now i have this error

2018-05-29 03:51:54,082 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Could not recover job graph for job e3369e6dce5305b9411b4695975eea26.
org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.

how can i clean the state and bring back the cluster ...

Thanks,

Miki