Hi , Any help figuring this will be highly appreciated. we are running on GC , after uploading new jar with old savepoint (taken day before) some of our checkpoints are fails on "Checkpoint failed: The assigned slot container_e02_1550091678485_0001_01_000023_7 was removed." what is the reason for that ? some used to fail on timeout, but after I increased it to 15 min, Than some crashed on "Checkpoint failed: Checkpoint Coordinator is suspending". what can cause that and how to solve it ? another question - recovering old state will case that the consumer will consume messages from that savepoint ? regards Avi Screen Shot 2019-02-14 at 2.18.21.png (174K) Download Attachment |
Hi, Avi I think the "Checkpoint failed: The assigned slot container_e02_1550091678485_0001_01_000023_7 was removed"(this may be a container failure or something else, could double check the taskamanger log for more information)and "Checkpoint failed: Checkpoint Coordinator is suspending" are not the root cause, could you please share the jobmanager log Whether the consumer consumes messages from that savepoint after recovering from the old state is controlled by the consumer, restoring just restore the offset if we snapshot it out when savepoint. Best, Congxian Avi Levi <[hidden email]> 于2019年2月14日周四 上午8:20写道:
|
Thank you very much, Please find attached the job manager log and the task manager log . Thanks Avi On Thu, Feb 14, 2019 at 3:30 AM Congxian Qiu <[hidden email]> wrote:
Archive.zip (951K) Download Attachment |
Free forum by Nabble | Edit this page |