Jobmanager stopped because uncaught exception

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Jobmanager stopped because uncaught exception

Lei Wang
Flink standalone HA.   Flink version 1.12.1 

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler      [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
        at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
        at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage. 
Before the ERROR, there's a lot of  info message like this:

2021-02-08 13:57:50,452 INFO  org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323


Any insight on this?

Thanks,
Lei
Reply | Threaded
Open this post in threaded view
|

Re: Jobmanager stopped because uncaught exception

Yang Wang
Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).
BTW, I think it is unrelated with the aliyun oss info logs.



Best,
Yang

Lei Wang <[hidden email]> 于2021年2月8日周一 下午2:22写道:
Flink standalone HA.   Flink version 1.12.1 

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler      [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
        at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
        at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage. 
Before the ERROR, there's a lot of  info message like this:

2021-02-08 13:57:50,452 INFO  org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323


Any insight on this?

Thanks,
Lei
Reply | Threaded
Open this post in threaded view
|

Re: Jobmanager stopped because uncaught exception

Lei Wang
I see there's a related issue   https://issues.apache.org/jira/browse/FLINK-21053 which is still open.

Does it mean the similar issue will still exist  even if i upgrade to 1.12.2 ? 

Thanks,
Lei

On Mon, Feb 8, 2021 at 3:54 PM Yang Wang <[hidden email]> wrote:
Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).
BTW, I think it is unrelated with the aliyun oss info logs.



Best,
Yang

Lei Wang <[hidden email]> 于2021年2月8日周一 下午2:22写道:
Flink standalone HA.   Flink version 1.12.1 

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler      [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
        at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
        at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage. 
Before the ERROR, there's a lot of  info message like this:

2021-02-08 13:57:50,452 INFO  org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323


Any insight on this?

Thanks,
Lei
Reply | Threaded
Open this post in threaded view
|

Re: Jobmanager stopped because uncaught exception

r_khachatryan
Hi,

The open issue you mentioned (FLINK-21053) is about preventing potential issues in the future.
The issue you are experiencing is most likely FLINK-20992 as Yang Wang said.
So upgrading to 1.12.2 should solve the problem.

Regards,
Roman


On Mon, Feb 8, 2021 at 9:05 AM Lei Wang <[hidden email]> wrote:
I see there's a related issue   https://issues.apache.org/jira/browse/FLINK-21053 which is still open.

Does it mean the similar issue will still exist  even if i upgrade to 1.12.2 ? 

Thanks,
Lei

On Mon, Feb 8, 2021 at 3:54 PM Yang Wang <[hidden email]> wrote:
Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).
BTW, I think it is unrelated with the aliyun oss info logs.



Best,
Yang

Lei Wang <[hidden email]> 于2021年2月8日周一 下午2:22写道:
Flink standalone HA.   Flink version 1.12.1 

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler      [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
        at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
        at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
        at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage. 
Before the ERROR, there's a lot of  info message like this:

2021-02-08 13:57:50,452 INFO  org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323


Any insight on this?

Thanks,
Lei