Re: Flink 1.11.1 - job manager exists with exit code 0

Posted by rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-11-1-job-manager-exists-with-exit-code-0-tp36938p37025.html

Ah yeah, after sending the email, I saw that the exit code is in the subject line :)

Can you post the entire log? What I find confusing is this log statement: "Stopped BLOB server at 0.0.0.0:6124". The BLOB server is usually only stopped during shutdown. For some reason, the JobManager is in the process of shutting down.

On Wed, Jul 29, 2020 at 7:38 AM Robert Metzger <[hidden email]> wrote:
Hey Alexey,

What is the exit code of the JobManager? Can you check if it has been killed by the OOM killer?
You could also try to run the job with DEBUG log level, it might give us an additional indication why the JVM dies.
What kind of job are you submitting? Is it complicated?

On Sat, Jul 25, 2020 at 6:43 AM Alexey Trenikhun <[hidden email]> wrote:
Hello,

I've Flink 1.11.1 session cluster running via docker compose, I upload job jar, when I submit job jobmanager exits without any errors in log:

...
{"@timestamp":"2020-07-25T04:32:54.007Z","@version":"1","message":"Starting execution of job katana-fsp (64ff3943fdc5024c5beef1612518c627) under job master id 00000000000000000000000000000000.","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.011Z","@version":"1","message":"Stopped BLOB server at 0.0.0.0:6124","logger_name":"org.apache.flink.runtime.blob.BlobServer","thread_name":"BlobServer shutdown hook","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.015Z","@version":"1","message":"Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy]","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.016Z","@version":"1","message":"Job katana-fsp (64ff3943fdc5024c5beef1612518c627) switched from state CREATED to RUNNING.","logger_name":"org.apache.flink.runtime.executiongraph.ExecutionGraph","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}

Any ideas how to diagnose it? 

Thanks,
Alexey