Hi All,
We are running into checkpoint timeout issue more frequently in production and we also see the below exception. We are running flink 1.4.0 and the checkpoints are saved on NFS. Can someone suggest how to overcome this? java.lang.IllegalStateException: Could not initialize operator state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:302) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:249) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: /mnt/checkpoints/02c4f8d5c11921f363b98c5959cc4f06/chk-101/e71d8eaf-ff4a-4783-92bd-77e3d8978e01 (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50) Thanks |
Hi Navneeth, Did you check if the path contains in the exception is really can not be found? Best, Vino Navneeth Krishnan <[hidden email]> 于2020年1月3日周五 上午8:23写道:
|
Hi Best, Congxian vino yang <[hidden email]> 于2020年1月3日周五 下午3:54写道:
|
Thanks Congxian & Vino. Yes, the file do exist and I don't see any problem in accessing it. Regarding flink 1.9, we haven't migrated yet but we are planning to do. Since we have to test it might take sometime. Thanks On Fri, Jan 3, 2020 at 2:14 AM Congxian Qiu <[hidden email]> wrote:
|
Hi Navneeth, Since the file still exists, this exception is very strange. I want to ask, does it happen by accident or frequently? Another concern is that since the 1.4 version is very far away, all maintenance and response are not as timely as the recent versions. I personally recommend upgrading as soon as possible. I can ping [hidden email] and see if it is possible to explain the cause of this problem. Best, Vino Navneeth Krishnan <[hidden email]> 于2020年1月4日周六 上午1:03写道:
|
Hi,
From the top of my head I don’t remember anything particular, however release 1.4.0 came with quite a lot of deep change which had it’s fair share number of bugs, that were subsequently fixed in later releases. Because 1.4.x tree is no longer supported I would strongly recommend to first upgrade to a more recent Flink version. If that’s not possible, I would at least upgrade to the latest release from 1.4.x tree (1.4.2). Piotrek
|
Thanks Vino & Piotr, sure, will upgrade the flink version and monitor it to see if the problem still exist. Thanks On Mon, Jan 6, 2020 at 12:39 AM Piotr Nowojski <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |