Re: Flink 1.11.1 - job manager exists with exit code 0

Posted by rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-11-1-job-manager-exists-with-exit-code-0-tp36938p37024.html

Hey Alexey,

What is the exit code of the JobManager? Can you check if it has been killed by the OOM killer?
You could also try to run the job with DEBUG log level, it might give us an additional indication why the JVM dies.
What kind of job are you submitting? Is it complicated?

On Sat, Jul 25, 2020 at 6:43 AM Alexey Trenikhun <[hidden email]> wrote:
Hello,

I've Flink 1.11.1 session cluster running via docker compose, I upload job jar, when I submit job jobmanager exits without any errors in log:

...
{"@timestamp":"2020-07-25T04:32:54.007Z","@version":"1","message":"Starting execution of job katana-fsp (64ff3943fdc5024c5beef1612518c627) under job master id 00000000000000000000000000000000.","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.011Z","@version":"1","message":"Stopped BLOB server at 0.0.0.0:6124","logger_name":"org.apache.flink.runtime.blob.BlobServer","thread_name":"BlobServer shutdown hook","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.015Z","@version":"1","message":"Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy]","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.016Z","@version":"1","message":"Job katana-fsp (64ff3943fdc5024c5beef1612518c627) switched from state CREATED to RUNNING.","logger_name":"org.apache.flink.runtime.executiongraph.ExecutionGraph","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}

Any ideas how to diagnose it? 

Thanks,
Alexey