Hi everyone, -- I'm having an issue when restarting a job in Flink. I'm doing a simple stop with savepoint and then start from the savepoint. Savepoints are stored in a separate folder, there is no configuration for "/tmp" folder in my setup. There is only 1 task manager and parallelism is 1. I'm getting FileNotFoundException: 31 Oct 2018 23:40:35,837 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - filter-business-metrics -> Sink: data_feed (1/1) (51ce53532932c33805291dc188d2f99e) switched from DEPLOYING to RUNNING. 31 Oct 2018 23:40:35,837 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - agents-working-on-interactions (1/1) (72a916158d07f2353fb270848d95ba2f) switched from DEPLOYING to RUNNING. 31 Oct 2018 23:40:35,929 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - interaction-details (1/1) (c004e64e90c0dbd3bc007459bc3d7420) switched from RUNNING to FAILED. java.io.FileNotFoundException: /tmp/flink-io-7bfd6603-c115-463d-bcfc-b97e31be5a37/f7ce787242e6afd91c3cbeccc2f74bc4a7dd0e6e600ff83e51bc5be9a95750f9.0.buffer (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) at org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259) at org.apache.flink.streaming.runtime.io.BufferSpiller.<init>(BufferSpiller.java:120) at org.apache.flink.streaming.runtime.io.BarrierBuffer.<init>(BarrierBuffer.java:149) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.<init>(StreamInputProcessor.java:129) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:748) I've checked the logs and there are no errors prior to that. The job was stopped with no issues, and it was starting normally and passed multiple operators setting them to RUNNING state. But for several other operators it throws this FileNotFoundException. Any help is appreciated. -- Regards, Dmitry -- |
my guess is that tmp directory got cleaned on your host and Flink couldn't restore memory state from it upon startup. Take a look at https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#configuring-temporary-io-directories article, I think it is relevant On Thu, Nov 1, 2018 at 8:51 PM Dmitry Minaev <[hidden email]> wrote:
|
Thank you, Alex, much appreciated.
I'll check if changing a temporary io folder helps to resolve the issue. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |