checkpoint always fails

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

checkpoint always fails

Marvin777
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.

Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

vino yang
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.


Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

Marvin777
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not be  repeated every time, but checkpoint failed all the time.)

state Backend
image.png


HA configuration
image.png


vino yang <[hidden email]> 于2018年7月26日周四 上午10:22写道:
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.


Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

Marvin777
Hi,vino:

Can you give me a hint,  why the checkpoint expires.

What causes this phenomenon in general?


image.png


thanks.

Marvin777 <[hidden email]> 于2018年7月26日周四 下午12:22写道:
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not be  repeated every time, but checkpoint failed all the time.)

state Backend
image.png


HA configuration
image.png


vino yang <[hidden email]> 于2018年7月26日周四 上午10:22写道:
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.


Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

vino yang
Hi Marvin,

It seems a Checkpoint Bug which triggered your checkpoint timeout. Can you create a issue in JIRA and describe your details (such as Flink version) and attach a complete log?

Thanks, vino.

2018-07-26 19:37 GMT+08:00 Marvin777 <[hidden email]>:
Hi,vino:

Can you give me a hint,  why the checkpoint expires.

What causes this phenomenon in general?


image.png


thanks.

Marvin777 <[hidden email]> 于2018年7月26日周四 下午12:22写道:
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not be  repeated every time, but checkpoint failed all the time.)

state Backend
image.png


HA configuration
image.png


vino yang <[hidden email]> 于2018年7月26日周四 上午10:22写道:
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.



Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

Marvin777
Hi vino,

the issue is FLINK-9945

thanks.


vino yang <[hidden email]> 于2018年7月27日周五 下午4:22写道:
Hi Marvin,

It seems a Checkpoint Bug which triggered your checkpoint timeout. Can you create a issue in JIRA and describe your details (such as Flink version) and attach a complete log?

Thanks, vino.

2018-07-26 19:37 GMT+08:00 Marvin777 <[hidden email]>:
Hi,vino:

Can you give me a hint,  why the checkpoint expires.

What causes this phenomenon in general?


image.png


thanks.

Marvin777 <[hidden email]> 于2018年7月26日周四 下午12:22写道:
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not be  repeated every time, but checkpoint failed all the time.)

state Backend
image.png


HA configuration
image.png


vino yang <[hidden email]> 于2018年7月26日周四 上午10:22写道:
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.



Reply | Threaded
Open this post in threaded view
|

Re: checkpoint always fails

vino yang
Hi Marvin,

Since you are configuring the semantics of Exactly-Once, a task will wait for all the barriers of multiple input channels on the input side when performing checkpoints.
This metric reflects the inconsistent progress of all upstream execution checkpoint tasks, and some tasks may be too slow to cause align to wait a long time. 
There are many reasons why some tasks are handled too slowly, such as keyBy forming data skew.

Thanks, vino.

2018-07-30 12:42 GMT+08:00 Marvin777 <[hidden email]>:
Hi vino,

I found the ' Buffered During Alignment ' term to be very large,  what causes this phenomenon in general?
image.png


Marvin777 <[hidden email]> 于2018年7月30日周一 上午10:36写道:
Hi vino,

the issue is FLINK-9945

thanks.


vino yang <[hidden email]> 于2018年7月27日周五 下午4:22写道:
Hi Marvin,

It seems a Checkpoint Bug which triggered your checkpoint timeout. Can you create a issue in JIRA and describe your details (such as Flink version) and attach a complete log?

Thanks, vino.

2018-07-26 19:37 GMT+08:00 Marvin777 <[hidden email]>:
Hi,vino:

Can you give me a hint,  why the checkpoint expires.

What causes this phenomenon in general?


image.png


thanks.

Marvin777 <[hidden email]> 于2018年7月26日周四 下午12:22写道:
log
https://issues.apache.org/jira/browse/FLINK-9945   (the exception can not be  repeated every time, but checkpoint failed all the time.)

state Backend
image.png


HA configuration
image.png


vino yang <[hidden email]> 于2018年7月26日周四 上午10:22写道:
Hi Marvin,

Thanks for reporting this issue.

Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration?

These information can help to trace the issue. 

Thanks, vino.

2018-07-26 10:12 GMT+08:00 Marvin777 <[hidden email]>:
Hi, all:

flink job can run normally, but checkpoint always fails, like this:
image.png

image.png
checkpoint configuration:

image.png

thanks.