|
Hello Flink experts, We are running Flink under Kubernetes and see that Job Manager die/restarted whenever Task Manager die/restarted or couldn't get connected each other. Is there any specific configurations/parameters that we need to turn on to stop this? Or this is expected? Thanks, Cam |
|
Hi Cam, Flink master should not die when getting disconnected with task managers. It may exit for cases below: 1. when the job terminated(FINISHED/FAILED/CANCELED). If you job is configured with no restart retry, a TM failure can cause the job to be FAILED. 2. JM lost HA leadership, e.g. lost connection to ZK 3. encounters other unexpected fatal errors. In this case we need to check the log to see what happens then Thanks, Zhu Zhu Cam Mach <[hidden email]> 于2019年8月12日周一 下午12:15写道:
|
|
Another possibility is the JM is killed externally, e.g. K8s may kill JM/TM if it exceeds the resource limit. Thanks, Zhu Zhu Zhu Zhu <[hidden email]> 于2019年8月12日周一 下午1:45写道:
|
|
Hi Zhu, Look like it's expected. Those are the cases that are happened to our cluster. Thanks for your response, Zhu Cam On Sun, Aug 11, 2019 at 10:53 PM Zhu Zhu <[hidden email]> wrote:
|
| Free forum by Nabble | Edit this page |
