Hi @all,
i have a yarn cluster with 5 Nodes with a running flink (0.10.2) instance. Today we shut down one of the Yarn-Hosts due to maintance reasons. After the restart we have some flink streaming routes in a restarting status (see stacktrace below). Now I want to restart these routes to continue their work from the last checkpoint. What can i do? Greets Dominique Stacktrace =================================================================================== java.io.IOException: Cannot get library with hash 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:254) at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:114) at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:710) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:471) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to fetch BLOB 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 from /10.24.20.14:60485 and store it under /tmp/blobStore-efdeddf9-d096-440f-a4cb-9c79334ff92c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:177) at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:245) ... 4 more Caused by: java.io.IOException: GET operation failed: Server side error: Cannot find required BLOB at /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:165) at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:125) ... 5 more Caused by: java.io.IOException: Server side error: Cannot find required BLOB at /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 at org.apache.flink.runtime.blob.BlobClient.receiveAndCheckResponse(BlobClient.java:213) at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:159) ... 6 more Caused by: java.io.IOException: Cannot find required BLOB at /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:202) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:112) |
Hi Dominic, I'm sorry that you ran into this issue. What do you mean by "flink streaming routes" ? Regarding the second question: "Now I want to restart these routes to continue their work from the last checkpoint. What can i do?" I think the feature you are looking for are savepoints: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html However, this has been added to Flink in 1.0, so its not available in your 0.10 release. I have to admit that I haven't seen the "Cannot find required BLOB at ..." exceptions before. Is there any chance that the files have been deleted from the /tmp directory by any external service (like a periodic cleanup script?) or has the /tmp dir been mounted to another disk in the meantime? On Wed, May 4, 2016 at 6:27 PM, Dominique Rondé <[hidden email]> wrote:
|
Hey Dominique!
Are you running the job in HA mode? – Ufuk On Thu, May 5, 2016 at 1:49 PM, Robert Metzger <[hidden email]> wrote: > Hi Dominic, > I'm sorry that you ran into this issue. > What do you mean by "flink streaming routes" ? > > Regarding the second question: "Now I want to restart these routes to > continue their work from the last checkpoint. What can i do?" > I think the feature you are looking for are savepoints: > https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html > However, this has been added to Flink in 1.0, so its not available in your > 0.10 release. > > > I have to admit that I haven't seen the "Cannot find required BLOB at ..." > exceptions before. Is there any chance that the files have been deleted from > the /tmp directory by any external service (like a periodic cleanup script?) > or has the /tmp dir been mounted to another disk in the meantime? > > > > On Wed, May 4, 2016 at 6:27 PM, Dominique Rondé > <[hidden email]> wrote: >> >> Hi @all, >> >> i have a yarn cluster with 5 Nodes with a running flink (0.10.2) instance. >> Today we shut down one of the Yarn-Hosts due to maintance reasons. After the >> restart we have some flink streaming routes in a restarting status (see >> stacktrace below). Now I want to restart these routes to continue their work >> from the last checkpoint. What can i do? >> >> Greets >> Dominique >> >> Stacktrace >> >> =================================================================================== >> >> java.io.IOException: Cannot get library with hash >> 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 >> at >> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:254) >> at >> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:114) >> at >> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:710) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:471) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.io.IOException: Failed to fetch BLOB >> 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 from /10.24.20.14:60485 and store >> it under >> /tmp/blobStore-efdeddf9-d096-440f-a4cb-9c79334ff92c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 >> at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:177) >> at >> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:245) >> ... 4 more >> Caused by: java.io.IOException: GET operation failed: Server side error: >> Cannot find required BLOB at >> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 >> at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:165) >> at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:125) >> ... 5 more >> Caused by: java.io.IOException: Server side error: Cannot find required >> BLOB at >> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 >> at >> org.apache.flink.runtime.blob.BlobClient.receiveAndCheckResponse(BlobClient.java:213) >> at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:159) >> ... 6 more >> Caused by: java.io.IOException: Cannot find required BLOB at >> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 >> at >> org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:202) >> at >> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:112) >> >> > |
Free forum by Nabble | Edit this page |