Hi,
I have some trouble with my HA K8 cluster. Current my Flink application has infinite stream. (With 12 parallelism) After few days I am losing my task managers. And they never reconnect to job manager. Because of this, application cannot get restored with restart policy. I did few searches and I found “akka.watch” configurations. But they didn’t work. I think this issue will solve the problem. Am I right? (https://issues.apache.org/jira/browse/FLINK-13883). Is there any workaround I can apply to solve this problem? Thanks Eray |
The akka.watch configuration options haven't been used for a while
irrespective of FLINK-13883 (but I can't quite tell atm since when). Let's start with what version of Flink you are using, and what the taskmanager/jobmanager logs say. On 25/11/2019 12:05, Eray Arslan wrote: > Hi, > > I have some trouble with my HA K8 cluster. > Current my Flink application has infinite stream. (With 12 parallelism) > After few days I am losing my task managers. And they never reconnect > to job manager. > Because of this, application cannot get restored with restart policy. > > I did few searches and I found “akka.watch” configurations. But they > didn’t work. > I think this issue will solve the problem. Am I right? > (https://issues.apache.org/jira/browse/FLINK-13883). Is there any > workaround I can apply to solve this problem? > > Thanks > > Eray > > |
Hi Chesnay, Thank you for reply. I figure out that issue with using livenessProbe on Task Manager deployment. But I think it is still a workaround.I am using Flink 1.9.1 (currently its latest version) And I am getting "connection unexpectedly closed by remote task manager" error on Task Manager. Because of that cluster losing Task Manager and job cannot restart cause not enough task manager on cluster. Thanks Chesnay Schepler <[hidden email]>, 28 Kas 2019 Per, 18:55 tarihinde şunu yazdı: The akka.watch configuration options haven't been used for a while Eray Arslan +90 537 738 14 34 |
Does this happen regularly? As in, the
cluster initially runs fine and around the same time-frame runs
into problems?
Can you provide the full logs for the
task and jobmanager?
On 29/11/2019 08:42, Eray Arslan wrote:
|
Free forum by Nabble | Edit this page |