Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

Eray Arslan
Hi, 

I have some trouble with my HA K8 cluster.
Current my Flink application has infinite stream. (With 12 parallelism)
After few days I am losing my task managers. And they never reconnect to job manager.
Because of this, application cannot get restored with restart policy.

I did few searches and I found “akka.watch” configurations. But they didn’t work.
I think this issue will solve the problem. Am I right? (https://issues.apache.org/jira/browse/FLINK-13883). Is there any workaround I can apply to solve this problem?

Thanks

Eray


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

Chesnay Schepler
The akka.watch configuration options haven't been used for a while
irrespective of FLINK-13883 (but I can't quite tell atm since when).

Let's start with what version of Flink you are using, and what the
taskmanager/jobmanager logs say.

On 25/11/2019 12:05, Eray Arslan wrote:

> Hi,
>
> I have some trouble with my HA K8 cluster.
> Current my Flink application has infinite stream. (With 12 parallelism)
> After few days I am losing my task managers. And they never reconnect
> to job manager.
> Because of this, application cannot get restored with restart policy.
>
> I did few searches and I found “akka.watch” configurations. But they
> didn’t work.
> I think this issue will solve the problem. Am I right?
> (https://issues.apache.org/jira/browse/FLINK-13883). Is there any
> workaround I can apply to solve this problem?
>
> Thanks
>
> Eray
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

Eray Arslan
Hi Chesnay,
Thank you for reply.
I figure out that issue with using livenessProbe on Task Manager deployment. But I think it is still a workaround.

I am using Flink 1.9.1 (currently its latest version)
And I am getting "connection unexpectedly closed by remote task manager" error on Task Manager.
Because of that cluster losing Task Manager and job cannot restart cause not enough task manager on cluster.

Thanks

Chesnay Schepler <[hidden email]>, 28 Kas 2019 Per, 18:55 tarihinde şunu yazdı:
The akka.watch configuration options haven't been used for a while
irrespective of FLINK-13883 (but I can't quite tell atm since when).

Let's start with what version of Flink you are using, and what the
taskmanager/jobmanager logs say.

On 25/11/2019 12:05, Eray Arslan wrote:
> Hi,
>
> I have some trouble with my HA K8 cluster.
> Current my Flink application has infinite stream. (With 12 parallelism)
> After few days I am losing my task managers. And they never reconnect
> to job manager.
> Because of this, application cannot get restored with restart policy.
>
> I did few searches and I found “akka.watch” configurations. But they
> didn’t work.
> I think this issue will solve the problem. Am I right?
> (https://issues.apache.org/jira/browse/FLINK-13883). Is there any
> workaround I can apply to solve this problem?
>
> Thanks
>
> Eray
>
>



--

Eray Arslan 
Yazılım Uzmanı  / Software Specialists
[hidden email]

+90 537 738 14 34
Trump Towers Mecidiyeköy Yolu No: 12 Kule 2, Mecidiyeköy - Şişli / İstanbul - Türkiye

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

Chesnay Schepler
Does this happen regularly? As in, the cluster initially runs fine and around the same time-frame runs into problems?

Can you provide the full logs for the task and jobmanager?

On 29/11/2019 08:42, Eray Arslan wrote:
Hi Chesnay,
Thank you for reply.
I figure out that issue with using livenessProbe on Task Manager deployment. But I think it is still a workaround.

I am using Flink 1.9.1 (currently its latest version)
And I am getting "connection unexpectedly closed by remote task manager" error on Task Manager.
Because of that cluster losing Task Manager and job cannot restart cause not enough task manager on cluster.

Thanks

Chesnay Schepler <[hidden email]>, 28 Kas 2019 Per, 18:55 tarihinde şunu yazdı:
The akka.watch configuration options haven't been used for a while
irrespective of FLINK-13883 (but I can't quite tell atm since when).

Let's start with what version of Flink you are using, and what the
taskmanager/jobmanager logs say.

On 25/11/2019 12:05, Eray Arslan wrote:
> Hi,
>
> I have some trouble with my HA K8 cluster.
> Current my Flink application has infinite stream. (With 12 parallelism)
> After few days I am losing my task managers. And they never reconnect
> to job manager.
> Because of this, application cannot get restored with restart policy.
>
> I did few searches and I found “akka.watch” configurations. But they
> didn’t work.
> I think this issue will solve the problem. Am I right?
> (https://issues.apache.org/jira/browse/FLINK-13883). Is there any
> workaround I can apply to solve this problem?
>
> Thanks
>
> Eray
>
>



--

Eray Arslan 
Yazılım Uzmanı  / Software Specialists
[hidden email]

+90 537 738 14 34
Trump Towers Mecidiyeköy Yolu No: 12 Kule 2, Mecidiyeköy - Şişli / İstanbul - Türkiye