Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
21 posts
|
Hi All,
We're using Flink-1.4.2 and noticed many dangling connections to Kafka after job deletion/recreation. The trigger here is Job cancelation/failure due to network down event followed by Job recreation. Our flink job has checkpointing disabled, and upon job failure (due to network failure), the Job got deleted and re-created. There were network failure event which impacting communication between task manager(s) and task-manager <-> job-manager. Our custom job controller monitored this condition and tried to cancel the job, followed by recreating the job (after a minute or so). Because of the network failure, the above steps were repeated many times and eventually the flink-docker-container's socket file descriptors were exhausted. Looks like there were many Kafka connections from flink-task-manager to the local Kafka broker. netstat -ntap | grep 9092 | grep java | wc -l 2235 Is this a known issue which already fixed in later release ? If yes, could someone point out the Jira link? If this is a new issue, could someone let me know how to move forward and debug this issue ? Looks like kafka consumers were not cleaned up properly upon job cancelation. Thanks, Fritz |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
84 posts
|
it might be related to this issue On Tue, Mar 26, 2019 at 4:35 PM Fritz Budiyanto <[hidden email]> wrote: Hi All, ... [show rest of quote] |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
21 posts
|
Thank you !
... [show rest of quote] |
Free forum by Nabble | Edit this page |