Checkpointing not happening in Standalone HA mode

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing not happening in Standalone HA mode

Vinay Patil
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Chesnay Schepler
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil


Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Vinay Patil
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil


Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Vinay Patil
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil


Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Chesnay Schepler
Can you provide us with the job code?

I assume that checkpointing runs properly if you submit the same job to a normal cluster?

On 25.07.2018 13:15, Vinay Patil wrote:
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil



Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

vino yang
Hi Vinay:

Did you call specific config API refer to this documentation[1];

Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."?


Thanks, vino.

2018-07-25 19:43 GMT+08:00 Chesnay Schepler <[hidden email]>:
Can you provide us with the job code?

I assume that checkpointing runs properly if you submit the same job to a normal cluster?


On 25.07.2018 13:15, Vinay Patil wrote:
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil




Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Vinay Patil
Hi Vino,

Yes I am enabling checkpoint in the code as follows :

StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(
"<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath");

env.enableCheckpointing(1000);
env.setSateBackend(new FsStateBackend("file:///<shared_mount_point_location>"));
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000);

In getJobConfiguration method I have set HA related properties like HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID


I can see the error in Job Manager logs where it says Collection Source is not being executed at the moment. Aborting checkpoint. In the pipeline I have a stream initialized using "fromCollection". I think I will have to get rid of this.

What do you suggest

Regards,
Vinay Patil


On Thu, Jul 26, 2018 at 12:04 PM vino yang <[hidden email]> wrote:
Hi Vinay:

Did you call specific config API refer to this documentation[1];

Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."?


Thanks, vino.

2018-07-25 19:43 GMT+08:00 Chesnay Schepler <[hidden email]>:
Can you provide us with the job code?

I assume that checkpointing runs properly if you submit the same job to a normal cluster?


On 25.07.2018 13:15, Vinay Patil wrote:
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil




Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

vino yang
Hi Vinay,

Oh!  You use a collection source? That's the problem. Please use a general source like Kafka or others. Maybe your checkpoint has not be triggered, your job has stopped.

Thanks, vino.

2018-07-27 16:07 GMT+08:00 Vinay Patil <[hidden email]>:
Hi Vino,

Yes I am enabling checkpoint in the code as follows :

StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(
"<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath");

env.enableCheckpointing(1000);
env.setSateBackend(new FsStateBackend("file:///<shared_mount_point_location>"));
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000);

In getJobConfiguration method I have set HA related properties like HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID


I can see the error in Job Manager logs where it says Collection Source is not being executed at the moment. Aborting checkpoint. In the pipeline I have a stream initialized using "fromCollection". I think I will have to get rid of this.

What do you suggest

Regards,
Vinay Patil


On Thu, Jul 26, 2018 at 12:04 PM vino yang <[hidden email]> wrote:
Hi Vinay:

Did you call specific config API refer to this documentation[1];

Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."?


Thanks, vino.

2018-07-25 19:43 GMT+08:00 Chesnay Schepler <[hidden email]>:
Can you provide us with the job code?

I assume that checkpointing runs properly if you submit the same job to a normal cluster?


On 25.07.2018 13:15, Vinay Patil wrote:
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil





Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing not happening in Standalone HA mode

Vinay Patil
Hi Vino,

Yes, Job runs successfully, however, no checkpoints are successful. I will update the source

Regards,
Vinay Patil


On Fri, Jul 27, 2018 at 2:00 PM vino yang <[hidden email]> wrote:
Hi Vinay,

Oh!  You use a collection source? That's the problem. Please use a general source like Kafka or others. Maybe your checkpoint has not be triggered, your job has stopped.

Thanks, vino.

2018-07-27 16:07 GMT+08:00 Vinay Patil <[hidden email]>:
Hi Vino,

Yes I am enabling checkpoint in the code as follows :

StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(
"<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath");

env.enableCheckpointing(1000);
env.setSateBackend(new FsStateBackend("file:///<shared_mount_point_location>"));
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000);

In getJobConfiguration method I have set HA related properties like HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID


I can see the error in Job Manager logs where it says Collection Source is not being executed at the moment. Aborting checkpoint. In the pipeline I have a stream initialized using "fromCollection". I think I will have to get rid of this.

What do you suggest

Regards,
Vinay Patil


On Thu, Jul 26, 2018 at 12:04 PM vino yang <[hidden email]> wrote:
Hi Vinay:

Did you call specific config API refer to this documentation[1];

Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."?


Thanks, vino.

2018-07-25 19:43 GMT+08:00 Chesnay Schepler <[hidden email]>:
Can you provide us with the job code?

I assume that checkpointing runs properly if you submit the same job to a normal cluster?


On 25.07.2018 13:15, Vinay Patil wrote:
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <[hidden email]> wrote:
Hi Chesnay,

No error in the logs. That is why I am not able to understand why checkpoints are getting triggered.

Regards,
Vinay Patil


On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <[hidden email]> wrote:
Please check the job- and taskmanager logs for anything suspicious.

On 25.07.2018 12:33, Vinay Patil wrote:
Hi,

I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I do not see any checkpoints triggered on Flink UI.

Am I missing any configurations to be set for the RemoteExecutionEnvironment for checkpointing to work. 


Regards,
Vinay Patil