Re: Flink 1.11.1 - job manager exists with exit code 0

Posted by rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-11-1-job-manager-exists-with-exit-code-0-tp36938p37029.html

Thanks for reporting back. Glad you found the issue. This reminds me of a ticket about this topic some time ago :) https://issues.apache.org/jira/browse/FLINK-15156

On Wed, Jul 29, 2020 at 7:51 AM Alexey Trenikhun <[hidden email]> wrote:
Hi Robert,
I found the cause, it was due to bug in job itself - code after streamEnv.execute(...) called System.exit(0), it was un-noticeable before 1.11, but with 1.11, I guess in Application Mode, main is called from job manager directly, and System.exit(0) just exits whole JVM.

Thank you and sorry for unnecessary noise
Alexey


From: Robert Metzger <[hidden email]>
Sent: Tuesday, July 28, 2020 10:38:42 PM
To: Alexey Trenikhun <[hidden email]>
Cc: Flink User Mail List <[hidden email]>
Subject: Re: Flink 1.11.1 - job manager exists with exit code 0
 
Hey Alexey,

What is the exit code of the JobManager? Can you check if it has been killed by the OOM killer?
You could also try to run the job with DEBUG log level, it might give us an additional indication why the JVM dies.
What kind of job are you submitting? Is it complicated?

On Sat, Jul 25, 2020 at 6:43 AM Alexey Trenikhun <[hidden email]> wrote:
Hello,

I've Flink 1.11.1 session cluster running via docker compose, I upload job jar, when I submit job jobmanager exits without any errors in log:

...
{"@timestamp":"2020-07-25T04:32:54.007Z","@version":"1","message":"Starting execution of job katana-fsp (64ff3943fdc5024c5beef1612518c627) under job master id 00000000000000000000000000000000.","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.011Z","@version":"1","message":"Stopped BLOB server at 0.0.0.0:6124","logger_name":"org.apache.flink.runtime.blob.BlobServer","thread_name":"BlobServer shutdown hook","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.015Z","@version":"1","message":"Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy]","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.016Z","@version":"1","message":"Job katana-fsp (64ff3943fdc5024c5beef1612518c627) switched from state CREATED to RUNNING.","logger_name":"org.apache.flink.runtime.executiongraph.ExecutionGraph","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}

Any ideas how to diagnose it? 

Thanks,
Alexey