Job fails to start with S3 savepoint

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Job fails to start with S3 savepoint

Bajaj, Abhinav

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Timo Walther
Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Ufuk Celebi
Hey Abhinav,

the Exception is thrown if the S3 object does not exist.

Can you double check that it actually does exist (no typos, etc.)?

Could this be related to accessing a different region than expected?

– Ufuk


On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther <[hidden email]> wrote:
Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:+1%20206-209-2767" value="+12062092767" target="_blank">+12062092767

Mobile: <a href="tel:+1%20708-329-9516" value="+17083299516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 



Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Bajaj, Abhinav

Hi Ufuk,

 

Thanks for replying.

The savepoint path is correct and it exists.

FYI, I used the monitoring REST APIs to cancel the job with savepoint.

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: Ufuk Celebi <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, March 20, 2017 at 2:41 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Hey Abhinav,

 

the Exception is thrown if the S3 object does not exist.

 

Can you double check that it actually does exist (no typos, etc.)?

 

Could this be related to accessing a different region than expected?

 

– Ufuk

 

 

On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther <[hidden email]> wrote:

Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:&#43;1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:&#43;1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Bajaj, Abhinav

Hi Ufuk,

 

Realized I sent an incomplete mail. Continuing my previous reply here.

 

Thanks for hint on the region. The bucket is in eu-west-1 region.

 

The flink configuration for checkpoints and savepoints is as below –

state.backend.fs.checkpointdir:     s3://flink-bucket/flink-checkpoints

state.savepoints.dir:   s3://flink-bucket/flink-savepoints

 

No region specified in the s3 urls above. But the savepoint was created successfully.

 

When using the monitoring REST API, the cancel-with-savepoint API returned the savepoint path – “s3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2”.

I used the same savepoint path while submitting a new job.

 

I am wondering why there would be a difference in behavior between creating and reading the savepoint.

 

I meantime, I will try updating the s3 urls to reflect the region and update here.

 

Thanks,

Abhinav

 

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: "Bajaj, Abhinav" <[hidden email]>
Date: Monday, March 20, 2017 at 10:46 AM
To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Hi Ufuk,

 

Thanks for replying.

The savepoint path is correct and it exists.

FYI, I used the monitoring REST APIs to cancel the job with savepoint.

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: Ufuk Celebi <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, March 20, 2017 at 2:41 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Hey Abhinav,

 

the Exception is thrown if the S3 object does not exist.

 

Can you double check that it actually does exist (no typos, etc.)?

 

Could this be related to accessing a different region than expected?

 

– Ufuk

 

 

On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther <[hidden email]> wrote:

Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:&#43;1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:&#43;1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Stephan Ewen
Not sure if that is the issue here, but the code does a "fs.exists()" for the path. We should not do this anywhere in all checkpointing related code, because this fails frequently with s3 due to its consistency model.

This should eventually succeed though, so I think it is not the problem here.



On Mon, Mar 20, 2017 at 7:04 PM, Bajaj, Abhinav <[hidden email]> wrote:

Hi Ufuk,

 

Realized I sent an incomplete mail. Continuing my previous reply here.

 

Thanks for hint on the region. The bucket is in eu-west-1 region.

 

The flink configuration for checkpoints and savepoints is as below –

state.backend.fs.checkpointdir:     s3://flink-bucket/flink-checkpoints

state.savepoints.dir:   s3://flink-bucket/flink-savepoints

 

No region specified in the s3 urls above. But the savepoint was created successfully.

 

When using the monitoring REST API, the cancel-with-savepoint API returned the savepoint path – “s3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2”.

I used the same savepoint path while submitting a new job.

 

I am wondering why there would be a difference in behavior between creating and reading the savepoint.

 

I meantime, I will try updating the s3 urls to reflect the region and update here.

 

Thanks,

Abhinav

 

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:+1%20206-209-2767" value="+12062092767" target="_blank">+12062092767

Mobile: <a href="tel:+1%20708-329-9516" value="+17083299516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: "Bajaj, Abhinav" <[hidden email]>
Date: Monday, March 20, 2017 at 10:46 AM
To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>


Subject: Re: Job fails to start with S3 savepoint

 

Hi Ufuk,

 

Thanks for replying.

The savepoint path is correct and it exists.

FYI, I used the monitoring REST APIs to cancel the job with savepoint.

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:+1%20206-209-2767" value="+12062092767" target="_blank">+12062092767

Mobile: <a href="tel:+1%20708-329-9516" value="+17083299516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: Ufuk Celebi <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, March 20, 2017 at 2:41 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Hey Abhinav,

 

the Exception is thrown if the S3 object does not exist.

 

Can you double check that it actually does exist (no typos, etc.)?

 

Could this be related to accessing a different region than expected?

 

– Ufuk

 

 

On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther <[hidden email]> wrote:

Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:+1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:+1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Job fails to start with S3 savepoint

Bajaj, Abhinav

Hi,

 

After doing some investigation, I found that this was a permission issue on S3 bucket.

May be the fs.exists() check also failed because of the permissions.

 

In my setup, the permissions on the bucket were fine. But the created savepoint & checkpoints inside the bucket, did not had the permissions set.

This made the troubleshooting more confusing for me.

After manually updating the permissions, I could start a new Job with the created savepoint.

 

It would have helped if the exception was more informative.

In this case, the error was “Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.”

“Invalid path…” is very generic in my opinion.

  

Thanks.

Abhinav

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  +12062092767

Mobile: +17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: Stephan Ewen <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Tuesday, March 21, 2017 at 6:59 AM
To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Not sure if that is the issue here, but the code does a "fs.exists()" for the path. We should not do this anywhere in all checkpointing related code, because this fails frequently with s3 due to its consistency model.

 

This should eventually succeed though, so I think it is not the problem here.

 

 

 

On Mon, Mar 20, 2017 at 7:04 PM, Bajaj, Abhinav <[hidden email]> wrote:

Hi Ufuk,

 

Realized I sent an incomplete mail. Continuing my previous reply here.

 

Thanks for hint on the region. The bucket is in eu-west-1 region.

 

The flink configuration for checkpoints and savepoints is as below –

state.backend.fs.checkpointdir:     s3://flink-bucket/flink-checkpoints

state.savepoints.dir:   s3://flink-bucket/flink-savepoints

 

No region specified in the s3 urls above. But the savepoint was created successfully.

 

When using the monitoring REST API, the cancel-with-savepoint API returned the savepoint path – “s3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2”.

I used the same savepoint path while submitting a new job.

 

I am wondering why there would be a difference in behavior between creating and reading the savepoint.

 

I meantime, I will try updating the s3 urls to reflect the region and update here.

 

Thanks,

Abhinav

 

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:&#43;1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:&#43;1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: "Bajaj, Abhinav" <[hidden email]>
Date: Monday, March 20, 2017 at 10:46 AM
To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>


Subject: Re: Job fails to start with S3 savepoint

 

Hi Ufuk,

 

Thanks for replying.

The savepoint path is correct and it exists.

FYI, I used the monitoring REST APIs to cancel the job with savepoint.

 

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:&#43;1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:&#43;1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps

 

 

 

From: Ufuk Celebi <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, March 20, 2017 at 2:41 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: Job fails to start with S3 savepoint

 

Hey Abhinav,

 

the Exception is thrown if the S3 object does not exist.

 

Can you double check that it actually does exist (no typos, etc.)?

 

Could this be related to accessing a different region than expected?

 

– Ufuk

 

 

On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther <[hidden email]> wrote:

Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:

Hi,

 

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

 

However, when I try to submit the same Job using the stored savepoint, it fails with below exception.

I am using Flink 1.2 and submitted the job from the UI dashboard.

 

Can anyone guide me through this issue?

 

Thanks,

Abhinav

 

Jobmanager logs with exception

 

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).

2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if no longer possible.

2017-03-18 00:10:09,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb

2017-03-18 00:10:09,640 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient                      - Checking and uploading JAR files

2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient                       - Blob client connecting to akka://flink/user/jobmanager

2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager                           - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter Example).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Using restart strategy NoRestartStrategy for c965addb24f955a28400f89c2a41db57.

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Running initialization on master for job Session Counter Example (c965addb24f955a28400f89c2a41db57).

2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager                           - Successfully ran initialization on master in 0 ms.

2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager                           - Starting job from savepoint 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

2017-03-18 00:18:15,728 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED to FAILING.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - TriggerWindow(TumblingProcessingTimeWindows(15000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to CANCELED.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if no longer possible.

2017-03-18 00:18:15,729 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING to FAILED.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) because a type of SuppressRestartsException was thrown and the restart strategy prevented it.

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)

                at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

                at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)

                at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

                at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

                at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

                at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: Invalid path 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)

                at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)

                at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)

                ... 10 more

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping checkpoint coordinator for job c965addb24f955a28400f89c2a41db57

2017-03-18 00:18:15,730 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - Shutting down

 

Abhinav Bajaj

Lead Engineer

HERE Predictive Analytics

Office:  <a href="tel:&#43;1%20206-209-2767" target="_blank">+12062092767

Mobile: <a href="tel:&#43;1%20708-329-9516" target="_blank">+17083299516

HERE Seattle

701 Pike Street, #2000, Seattle, WA 98101, USA

47° 36' 41" N. 122° 19' 57" W

HERE Maps