job history server

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

job history server

Richard Moorhead
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?

Reply | Threaded
Open this post in threaded view
|

Re: job history server

Benchao Li
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午8:16写道:
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?



--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: job history server

Richard Moorhead
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H                                                                                                                                                                                                                                                  [18:15:20]
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
                      2.1G  777M  1.2G  41% /
tmpfs                 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <[hidden email]> wrote:
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午8:16写道:
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?



--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: job history server

Benchao Li
`df -H` only gives the sizes, not inodes information. Could you also show us the result of `df -iH`?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午9:40写道:
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H                                                                                                                                                                                                                                                  [18:15:20]
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
                      2.1G  777M  1.2G  41% /
tmpfs                 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <[hidden email]> wrote:
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午8:16写道:
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?



--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: job history server

Richard Moorhead
I did not know that.

I have since wiped the directory. I will post when I see this error again.

On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <[hidden email]> wrote:
`df -H` only gives the sizes, not inodes information. Could you also show us the result of `df -iH`?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午9:40写道:
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H                                                                                                                                                                                                                                                  [18:15:20]
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
                      2.1G  777M  1.2G  41% /
tmpfs                 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <[hidden email]> wrote:
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午8:16写道:
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?



--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: job history server

Richard Moorhead
2020-02-18 09:44:45,227 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/process
ing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/062e4d80ed1d4bdafd24e46
2245c5926/subtasks/86/attempts/0.json: No space left on device

and there it is:

42103b5b-5410-d2d8-6a0b-21757e4a0fbc ~
0 % df -iH
Filesystem           Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg00-rootlv00
                       132k   13k  119k   10% /
tmpfs                  508k  465k   43k   92% /dev/shm

Thanks for the tip.

On Mon, Feb 17, 2020 at 8:08 PM Richard Moorhead <[hidden email]> wrote:
I did not know that.

I have since wiped the directory. I will post when I see this error again.

On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <[hidden email]> wrote:
`df -H` only gives the sizes, not inodes information. Could you also show us the result of `df -iH`?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午9:40写}E9��:
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H                                                                                                       =2�                                                                                                                                          [18:15:20]
Filesystem       �=Ap    Size  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
                      2.1G  777M  1.2G  41% /
tmpfs                 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <[hidden email]> wrote:
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <[hidden email]> 于2020年2月18日周二 上午8:16写道:
I see the following exception often:

2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException: /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts: No space left on device
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.javaz674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTas+.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere near full?

Is there a workaround to this?



--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]