Hello,
I've Flink 1.11.1 session cluster running via docker compose, I upload job jar, when I submit job jobmanager exits without any errors in log:
...
{"@timestamp":"2020-07-25T04:32:54.007Z","@version":"1","message":"Starting execution of job katana-fsp (64ff3943fdc5024c5beef1612518c627) under job master id 00000000000000000000000000000000.","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.011Z","@version":"1","message":"Stopped BLOB server at 0.0.0.0:6124","logger_name":"org.apache.flink.runtime.blob.BlobServer","thread_name":"BlobServer shutdown hook","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.015Z","@version":"1","message":"Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy]","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
{"@timestamp":"2020-07-25T04:32:54.016Z","@version":"1","message":"Job katana-fsp (64ff3943fdc5024c5beef1612518c627) switched from state CREATED to RUNNING.","logger_name":"org.apache.flink.runtime.executiongraph.ExecutionGraph","thread_name":"flink-akka.actor.default-dispatcher-18","level":"INFO","level_value":20000}
Any ideas how to diagnose it?
Thanks,
Alexey
|
Hey Alexey, What is the exit code of the JobManager? Can you check if it has been killed by the OOM killer? You could also try to run the job with DEBUG log level, it might give us an additional indication why the JVM dies. What kind of job are you submitting? Is it complicated? On Sat, Jul 25, 2020 at 6:43 AM Alexey Trenikhun <[hidden email]> wrote:
|
Ah yeah, after sending the email, I saw that the exit code is in the subject line :) On Wed, Jul 29, 2020 at 7:38 AM Robert Metzger <[hidden email]> wrote:
|
In reply to this post by rmetzger0
Hi Robert,
I found the cause, it was due to bug in job itself - code after streamEnv.execute(...) called System.exit(0), it was un-noticeable before 1.11, but with 1.11, I guess in Application Mode,
main is called from job manager directly, and System.exit(0) just exits whole JVM.
Thank you and sorry for unnecessary noise
Alexey
From: Robert Metzger <[hidden email]>
Sent: Tuesday, July 28, 2020 10:38:42 PM To: Alexey Trenikhun <[hidden email]> Cc: Flink User Mail List <[hidden email]> Subject: Re: Flink 1.11.1 - job manager exists with exit code 0 Hey Alexey,
What is the exit code of the JobManager? Can you check if it has been killed by the OOM killer?
You could also try to run the job with DEBUG log level, it might give us an additional indication why the JVM dies.
What kind of job are you submitting? Is it complicated?
On Sat, Jul 25, 2020 at 6:43 AM Alexey Trenikhun <[hidden email]> wrote:
|
Thanks for reporting back. Glad you found the issue. This reminds me of a ticket about this topic some time ago :) https://issues.apache.org/jira/browse/FLINK-15156 On Wed, Jul 29, 2020 at 7:51 AM Alexey Trenikhun <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |