Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Mikhail Pryakhin-2
Hi Flink experts!

When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? 
I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to Persistent node type in case Zookeeper doesn't support this sort of nodes. 
But anyway, it is worth removing the job Zookeeper node when a job is cancelled, isn't it?

Thank you in advance!

Kind Regards,
Mike Pryakhin


smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Andrey Zagrebin
Hi Mike,

What was the full job life cycle? 
Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 
Was there a failover of Job Master while running before the cancelation?
What version of Zookeeper do you use?

Flink creates child nodes to create a lock for the job in Zookeeper.
Lock is removed by removing child node (ephemeral).
Persistent node can be a problem because if job dies and does not remove it, 
persistent node will not timeout and disappear as ephemeral one 
and the next job instance will not delete it because it is supposed to be locked by the previous.

There was a recent fix in 1.6.1 where the job data was not properly deleted from Zookeeper [1].
In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.

Best,
Andrey


On 25 Oct 2018, at 15:30, Mikhail Pryakhin <[hidden email]> wrote:

Hi Flink experts!

When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? 
I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to Persistent node type in case Zookeeper doesn't support this sort of nodes. 
But anyway, it is worth removing the job Zookeeper node when a job is cancelled, isn't it?

Thank you in advance!

Kind Regards,
Mike Pryakhin


Reply | Threaded
Open this post in threaded view
|

Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Mikhail Pryakhin-2
Hi Andrey, Thanks a lot for your reply!

What was the full job life cycle? 

1. The job is deployed as a YARN cluster with the following properties set

high-availability: zookeeper
high-availability.zookeeper.quorum: <a list of zookeeper hosts>
high-availability.zookeeper.storageDir: <a href="hdfs:///&lt;recovery-folder-path&gt;" class="">hdfs:///<recovery-folder-path>
high-availability.zookeeper.path.root: <flink-root-path>
high-availability.zookeeper.path.namespace: <flink-job-name>

2. The job is cancelled via flink cancel <job-id> command.

   What I've noticed:
when the job is running the following directory structure is created in zookeeper

/<flink-root-path>/<flink-job-name>/leader/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leader/rest_server_lock
/<flink-root-path>/<flink-job-name>/leader/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/0000000000000000041
/<flink-root-path>/<flink-job-name>/checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde


when the job is cancelled the some ephemeral nodes disappear, but most of them are still there:

/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/
/<flink-root-path>/<flink-job-name>/checkpoint-counter/
/<flink-root-path>/<flink-job-name>/running_job_registry/

Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 

I start the job with Flink-1.6.1


Was there a failover of Job Master while running before the cancelation?
no there was no failover, as the job is deployed as a YARN cluster,  (YARN Cluster High Availability guide states that no failover is required)

What version of Zookeeper do you use?
Zookeer-3.4.10

In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.
as far as I understood the issue concerns a JobManager failover process and my question is about a manual intended cancellation of a job.

Here is the method [1] responsible for cleaning zookeeper folders up [1] which is called when the job manager has stopped [2]. 
And it seems it only cleans up the folder running_job_registry, other folders stay untouched. I supposed that everything under the /<flink-root-path>/<flink-job-name>/ folder is cleaned up when the job is cancelled.



Kind Regards,
Mike Pryakhin

On 26 Oct 2018, at 12:39, Andrey Zagrebin <[hidden email]> wrote:

Hi Mike,

What was the full job life cycle? 
Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 
Was there a failover of Job Master while running before the cancelation?
What version of Zookeeper do you use?

Flink creates child nodes to create a lock for the job in Zookeeper.
Lock is removed by removing child node (ephemeral).
Persistent node can be a problem because if job dies and does not remove it, 
persistent node will not timeout and disappear as ephemeral one 
and the next job instance will not delete it because it is supposed to be locked by the previous.

There was a recent fix in 1.6.1 where the job data was not properly deleted from Zookeeper [1].
In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.

Best,
Andrey


On 25 Oct 2018, at 15:30, Mikhail Pryakhin <[hidden email]> wrote:

Hi Flink experts!

When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? 
I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to Persistent node type in case Zookeeper doesn't support this sort of nodes. 
But anyway, it is worth removing the job Zookeeper node when a job is cancelled, isn't it?

Thank you in advance!

Kind Regards,
Mike Pryakhin




smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Till Rohrmann
Hi Mike,

thanks for reporting this issue. I think you're right that Flink leaves some empty nodes in ZooKeeper. It seems that we don't delete the <flink-job-name> node with all its children in ZooKeeperHaServices#closeAndCleanupAllData.

Could you please open a JIRA issue to in order to fix it? Thanks a lot!

Cheers,
Till

On Fri, Oct 26, 2018 at 4:31 PM Mikhail Pryakhin <[hidden email]> wrote:
Hi Andrey, Thanks a lot for your reply!

What was the full job life cycle? 

1. The job is deployed as a YARN cluster with the following properties set

high-availability: zookeeper
high-availability.zookeeper.quorum: <a list of zookeeper hosts>
high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path>
high-availability.zookeeper.path.root: <flink-root-path>
high-availability.zookeeper.path.namespace: <flink-job-name>

2. The job is cancelled via flink cancel <job-id> command.

   What I've noticed:
when the job is running the following directory structure is created in zookeeper

/<flink-root-path>/<flink-job-name>/leader/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leader/rest_server_lock
/<flink-root-path>/<flink-job-name>/leader/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/0000000000000000041
/<flink-root-path>/<flink-job-name>/checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde


when the job is cancelled the some ephemeral nodes disappear, but most of them are still there:

/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/
/<flink-root-path>/<flink-job-name>/checkpoint-counter/
/<flink-root-path>/<flink-job-name>/running_job_registry/

Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 

I start the job with Flink-1.6.1


Was there a failover of Job Master while running before the cancelation?
no there was no failover, as the job is deployed as a YARN cluster,  (YARN Cluster High Availability guide states that no failover is required)

What version of Zookeeper do you use?
Zookeer-3.4.10

In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.
as far as I understood the issue concerns a JobManager failover process and my question is about a manual intended cancellation of a job.

Here is the method [1] responsible for cleaning zookeeper folders up [1] which is called when the job manager has stopped [2]. 
And it seems it only cleans up the folder running_job_registry, other folders stay untouched. I supposed that everything under the /<flink-root-path>/<flink-job-name>/ folder is cleaned up when the job is cancelled.



Kind Regards,
Mike Pryakhin

On 26 Oct 2018, at 12:39, Andrey Zagrebin <[hidden email]> wrote:

Hi Mike,

What was the full job life cycle? 
Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 
Was there a failover of Job Master while running before the cancelation?
What version of Zookeeper do you use?

Flink creates child nodes to create a lock for the job in Zookeeper.
Lock is removed by removing child node (ephemeral).
Persistent node can be a problem because if job dies and does not remove it, 
persistent node will not timeout and disappear as ephemeral one 
and the next job instance will not delete it because it is supposed to be locked by the previous.

There was a recent fix in 1.6.1 where the job data was not properly deleted from Zookeeper [1].
In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.

Best,
Andrey


On 25 Oct 2018, at 15:30, Mikhail Pryakhin <[hidden email]> wrote:

Hi Flink experts!

When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? 
I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to Persistent node type in case Zookeeper doesn't support this sort of nodes. 
But anyway, it is worth removing the job Zookeeper node when a job is cancelled, isn't it?

Thank you in advance!

Kind Regards,
Mike Pryakhin



Reply | Threaded
Open this post in threaded view
|

Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Mikhail Pryakhin-2
Hi Till, thanks for your reply! here is the issue ticket:


Kind Regards,
Mike Pryakhin

On 26 Oct 2018, at 18:29, Till Rohrmann <[hidden email]> wrote:

Hi Mike,

thanks for reporting this issue. I think you're right that Flink leaves some empty nodes in ZooKeeper. It seems that we don't delete the <flink-job-name> node with all its children in ZooKeeperHaServices#closeAndCleanupAllData.

Could you please open a JIRA issue to in order to fix it? Thanks a lot!

Cheers,
Till

On Fri, Oct 26, 2018 at 4:31 PM Mikhail Pryakhin <[hidden email]> wrote:
Hi Andrey, Thanks a lot for your reply!

What was the full job life cycle? 

1. The job is deployed as a YARN cluster with the following properties set

high-availability: zookeeper
high-availability.zookeeper.quorum: <a list of zookeeper hosts>
high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path>
high-availability.zookeeper.path.root: <flink-root-path>
high-availability.zookeeper.path.namespace: <flink-job-name>

2. The job is cancelled via flink cancel <job-id> command.

   What I've noticed:
when the job is running the following directory structure is created in zookeeper

/<flink-root-path>/<flink-job-name>/leader/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leader/rest_server_lock
/<flink-root-path>/<flink-job-name>/leader/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/0000000000000000041
/<flink-root-path>/<flink-job-name>/checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde


when the job is cancelled the some ephemeral nodes disappear, but most of them are still there:

/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde
/<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
/<flink-root-path>/<flink-job-name>/checkpoints/
/<flink-root-path>/<flink-job-name>/checkpoint-counter/
/<flink-root-path>/<flink-job-name>/running_job_registry/

Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 

I start the job with Flink-1.6.1


Was there a failover of Job Master while running before the cancelation?
no there was no failover, as the job is deployed as a YARN cluster,  (YARN Cluster High Availability guide states that no failover is required)

What version of Zookeeper do you use?
Zookeer-3.4.10

In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.
as far as I understood the issue concerns a JobManager failover process and my question is about a manual intended cancellation of a job.

Here is the method [1] responsible for cleaning zookeeper folders up [1] which is called when the job manager has stopped [2]. 
And it seems it only cleans up the folder running_job_registry, other folders stay untouched. I supposed that everything under the /<flink-root-path>/<flink-job-name>/ folder is cleaned up when the job is cancelled.



Kind Regards,
Mike Pryakhin

On 26 Oct 2018, at 12:39, Andrey Zagrebin <[hidden email]> wrote:

Hi Mike,

What was the full job life cycle? 
Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 
Was there a failover of Job Master while running before the cancelation?
What version of Zookeeper do you use?

Flink creates child nodes to create a lock for the job in Zookeeper.
Lock is removed by removing child node (ephemeral).
Persistent node can be a problem because if job dies and does not remove it, 
persistent node will not timeout and disappear as ephemeral one 
and the next job instance will not delete it because it is supposed to be locked by the previous.

There was a recent fix in 1.6.1 where the job data was not properly deleted from Zookeeper [1].
In general, it should not be the case and all job related data should be cleaned from Zookeeper upon cancellation.

Best,
Andrey


On 25 Oct 2018, at 15:30, Mikhail Pryakhin <[hidden email]> wrote:

Hi Flink experts!

When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? 
I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to Persistent node type in case Zookeeper doesn't support this sort of nodes. 
But anyway, it is worth removing the job Zookeeper node when a job is cancelled, isn't it?

Thank you in advance!

Kind Regards,
Mike Pryakhin





smime.p7s (2K) Download Attachment