Hello All,
I have a job which fails lets say after every 14 days with IO Exception, failed to fetch blob. I submitted the job using command line using java jar.Below is the exception I'm getting: java.io.IOException: Failed to fetch BLOB d23d168655dd51efe4764f9b22b85a18/p-446f7e0137fd66af062de7a56c55528171d380db-baf0b6bce698d586a3b0d30c6e487d16 from flink-job-mamager/10.20.1.85:38147 and store it under /tmp/blobStore-e3e34fec-22d9-4b3c-b542-0c1e5cdcf896/incoming/temp-00000022 at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:191) at org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:177) at org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:205) at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:119) at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:878) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:589) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: GET operation failed: Server side error: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 at org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:253) at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:166) ... 6 more Caused by: java.io.IOException: Server side error: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 at org.apache.flink.runtime.blob.BlobClient.receiveAndCheckGetResponse(BlobClient.java:306) at org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:247) ... 7 more Caused by: java.nio.file.NoSuchFileException: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:452) at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:521) at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:231) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117) All the configurations of blob are default, i didn't change anything. Can someone help me to fix this issue. Thanks, Manjusha |
How is your cluster configured ? What is the Checkpoint/save point directory configuration ? On Tue, Oct 23, 2018 at 8:00 AM Manjusha Vuyyuru <[hidden email]> wrote:
|
In reply to this post by Manjusha Vuyyuru
Hi Manjusha, I am not sure what is wrong, but Nico or Till (cc'ed) might be able to help you. Best, Dawid On 23/10/2018 06:58, Manjusha Vuyyuru
wrote:
signature.asc (849 bytes) Download Attachment |
Hello, Checkpointing to hdfs. state.backend.fs.checkpointdir: hdfs://flink-hdfs:9000/flink-checkpoints state.checkpoints.num-retained: 2 Thanks, Manjusha On Tue, Oct 23, 2018 at 1:05 PM Dawid Wysakowicz <[hidden email]> wrote:
|
Hi Manjusha,
If you are, for example, using one of Amazon's Linux AMIs on EMR, you may fall into a trap that Lasse described during his Flink Forward talk [1]: These images include a default cron job that cleans up files in /tmp which have not been recently accessed. The default BLOB server directory (blob.storage.directory) will store files under /tmp and on the JobManager, they are only accessed during deployments, so that falls under this cleanup detection. A solution is to change the BLOB storage directory. Nico [1] https://data-artisans.com/flink-forward-berlin/resources/our-successful-journey-with-flink On 23/10/2018 10:27, Manjusha Vuyyuru wrote: > Hello, > > Checkpointing to hdfs. > *state.backend.fs.checkpointdir: hdfs://flink-hdfs:9000/flink-checkpoints* > *state.checkpoints.num-retained: 2* > * > * > Thanks, > Manjusha > > > On Tue, Oct 23, 2018 at 1:05 PM Dawid Wysakowicz <[hidden email] > <mailto:[hidden email]>> wrote: > > Hi Manjusha, > > I am not sure what is wrong, but Nico or Till (cc'ed) might be able > to help you. > > Best, > > Dawid > > On 23/10/2018 06:58, Manjusha Vuyyuru wrote: >> Hello All, >> >> I have a job which fails lets say after every 14 days with IO >> Exception, failed to fetch blob. >> I submitted the job using command line using java jar.Below is the >> exception I'm getting: >> >> java.io.IOException: Failed to fetch BLOB d23d168655dd51efe4764f9b22b85a18/p-446f7e0137fd66af062de7a56c55528171d380db-baf0b6bce698d586a3b0d30c6e487d16 from flink-job-mamager/10.20.1.85:38147 <http://10.20.1.85:38147> and store it under /tmp/blobStore-e3e34fec-22d9-4b3c-b542-0c1e5cdcf896/incoming/temp-00000022 >> at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:191) >> at org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:177) >> at org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:205) >> at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:119) >> at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:878) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:589) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.io.IOException: GET operation failed: Server side error: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 >> at org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:253) >> at org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:166) >> ... 6 more >> Caused by: java.io.IOException: Server side error: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 >> at org.apache.flink.runtime.blob.BlobClient.receiveAndCheckGetResponse(BlobClient.java:306) >> at org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:247) >> ... 7 more >> Caused by: java.nio.file.NoSuchFileException: /tmp/blobStore-5535a94c-5bdd-41f3-878d-8320e53ba7c5/incoming/temp-00182356 >> at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) >> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) >> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) >> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) >> at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) >> at java.nio.file.Files.move(Files.java:1395) >> at org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:452) >> at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:521) >> at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:231) >> at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117) >> All the configurations of blob are default, i didn't change anything. >> Can someone help me to fix this issue. >> Thanks, >> Manjusha > -- Nico Kruber | Software Engineer data Artisans Follow us @dataArtisans -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Stresemannstr. 121A,10963 Berlin, Germany data Artisans, Inc. | 1161 Mission Street, San Francisco, CA-94103, USA -- Data Artisans GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen |
Thanks Nico!. I was using google compute engine.Have changed blob storage directory, have to wait and see if it solves the problem. Thanks, Manju On Tue, Oct 23, 2018 at 2:07 PM Nico Kruber <[hidden email]> wrote: Hi Manjusha, |
Free forum by Nabble | Edit this page |