(DEPRECATED) Apache Flink User Mailing List archive.

Jobmanager stopped because uncaught exception

Classic

List

Threaded

4 messages Options

Lei Wang

Jobmanager stopped because uncaught exception

Flink standalone HA. Flink version 1.12.1

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage.

Before the ERROR, there's a lot of info message like this:

2021-02-08 13:57:50,452 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323

Any insight on this?

Thanks,

Lei

Yang Wang

Re: Jobmanager stopped because uncaught exception

Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).

BTW, I think it is unrelated with the aliyun oss info logs.

[1]. https://issues.apache.org/jira/browse/FLINK-20992

Best,

Yang

Lei Wang <[hidden email]> 于2021年2月8日周一下午2:22写道：

Flink standalone HA. Flink version 1.12.1

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage.
Before the ERROR, there's a lot of info message like this:

2021-02-08 13:57:50,452 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323

Any insight on this?

Thanks,
Lei

Lei Wang

Re: Jobmanager stopped because uncaught exception

I see there's a related issue https://issues.apache.org/jira/browse/FLINK-21053 which is still open.

Does it mean the similar issue will still exist even if i upgrade to 1.12.2 ?

Thanks,
Lei

On Mon, Feb 8, 2021 at 3:54 PM Yang Wang <[hidden email]> wrote:

Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).
BTW, I think it is unrelated with the aliyun oss info logs.

[1]. https://issues.apache.org/jira/browse/FLINK-20992

Best,
Yang

Lei Wang <[hidden email]> 于2021年2月8日周一下午2:22写道：
Flink standalone HA. Flink version 1.12.1

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage.
Before the ERROR, there's a lot of info message like this:

2021-02-08 13:57:50,452 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323

Any insight on this?

Thanks,
Lei

r_khachatryan

Re: Jobmanager stopped because uncaught exception

Hi,

The open issue you mentioned (FLINK-21053) is about preventing potential issues in the future.

The issue you are experiencing is most likely FLINK-20992 as Yang Wang said.

So upgrading to 1.12.2 should solve the problem.

Regards,
Roman

On Mon, Feb 8, 2021 at 9:05 AM Lei Wang <[hidden email]> wrote:

I see there's a related issue https://issues.apache.org/jira/browse/FLINK-21053 which is still open.

Does it mean the similar issue will still exist even if i upgrade to 1.12.2 ?

Thanks,
Lei

On Mon, Feb 8, 2021 at 3:54 PM Yang Wang <[hidden email]> wrote:
Maybe it is a known issue[1] and has already been resolved in 1.12.2(will release soon).
BTW, I think it is unrelated with the aliyun oss info logs.

[1]. https://issues.apache.org/jira/browse/FLINK-20992

Best,
Yang

Lei Wang <[hidden email]> 于2021年2月8日周一下午2:22写道：
Flink standalone HA. Flink version 1.12.1

2021-02-08 13:57:50,550 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread 'cluster-io-thread-30' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3a4ab3cb rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6222948[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 455]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_275]
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_275]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_275]
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

Using aliyun oss as statebackend storage.
Before the ERROR, there's a lot of info message like this:

2021-02-08 13:57:50,452 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HT
TP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 6020D2DEA1E11430349E8323

Any insight on this?

Thanks,
Lei