This post was updated on .
I'm submitting a Flink job to a cluster that has three Task Managers via the
Flink dashboard. When I set `Parallelism` to 1 (which is default), everything runs as expected. But when I increase `Parallelism` to anything more than 1, the job fails with the exception, > /java.io.FileNotFoundException: /tmp/flink-io-f91d7812-a411-4b58-a891-c9be1cde91da/08caeac37d6b8351daf6a3eb123a473106c56381b101f3e5d9704df9f78406a2.0.buffer (No such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > at org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259) > at org.apache.flink.streaming.runtime.io.BufferSpiller.<init>(BufferSpiller.java:120) > at org.apache.flink.streaming.runtime.io.BarrierBuffer.<init>(BarrierBuffer.java:149) > at org.apache.flink.streaming.runtime.io.StreamInputProcessor.<init>(StreamInputProcessor.java:129) > at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56) > at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Thread.java:745)/ I've enabled checkpoints on my Flink job with "Exactly Once" retry strategy every 10 seconds. Here's my checkpoint configuration, > env.setStateBackend(new RocksDBStateBackend(props.getFlinkCheckpointDataUri(), true)); > env.enableCheckpointing(10000, EXACTLY_ONCE); //10 seconds > CheckpointConfig config = env.getCheckpointConfig(); > config.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);/ Should I configure something else other than simply entering `2` while submitting the job in the dashboard? EDIT: If I disable checkpoints and upload the job, it runs without any error. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Could you tell us which Flink version you are using?
On 28.05.2018 14:01, HarshithBolar wrote: > I'm submitting a Flink job to a cluster that has three Task Managers via the > Flink dashboard. When I set `Parallelism` to 1 (which is default), > everything runs as expected. But when I increase `Parallelism` to anything > more than 1, the job fails with the exception, > > /java.io.FileNotFoundException: > /tmp/flink-io-f91d7812-a411-4b58-a891-c9be1cde91da/08caeac37d6b8351daf6a3eb123a473106c56381b101f3e5d9704df9f78406a2.0.buffer > (No such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > at > org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259) > at > org.apache.flink.streaming.runtime.io.BufferSpiller.<init>(BufferSpiller.java:120) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.<init>(BarrierBuffer.java:149) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.<init>(StreamInputProcessor.java:129) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Thread.java:745)/ > > I've enabled checkpoints on my Flink job with "Exactly Once" retry strategy > every 10 seconds. Here's my checkpoint configuration, > > / env.setStateBackend(new > RocksDBStateBackend(props.getFlinkCheckpointDataUri(), true)); > env.enableCheckpointing(10000, EXACTLY_ONCE); //10 seconds > CheckpointConfig config = env.getCheckpointConfig(); > > config.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);/ > > Should I configure something else other than simply entering `2` while > submitting the job in the dashboard? > > EDIT: If I disable checkpoints and upload the job, it runs without any > error. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
I'm using Flink 1.4.2
-- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Just a heads up. I haven't found the root cause for this issue yet but
restarting all the nodes seems to have solved this issue. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |