Flink is Unstable when TM > 1

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink is Unstable when TM > 1

Saliya Ekanayake
Hi,

I've been trying to run the provided KMeans example on a 16 node cluster. I was testing with 2 Task Managers (TM) per node because each node has 2 sockets (CPUs). A socket contains 12 cores, so I've set the number of slots per TM as 12.The total parallelism is 384 (12 slots x 2 TMs x 16 nodes). 

However, Flink TMs keep failing time to time causing KMeans to fail. The only explanation I could find from logs is that TMs unregister from Job Manager. I've increased Akka timeout to 1000s as well. 

Any suggestions on this?

The data sizes I tried were 10k points, 250k points, and 1mil points. Number of centers were 100 to 1000. None of these sizes completed.

Thank you,
Saliya

--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Reply | Threaded
Open this post in threaded view
|

Re: Flink is Unstable when TM > 1

Ufuk Celebi
Can you please share all available logs?

On Fri, Jul 8, 2016 at 5:57 AM, Saliya Ekanayake <[hidden email]> wrote:

> Hi,
>
> I've been trying to run the provided KMeans example on a 16 node cluster. I
> was testing with 2 Task Managers (TM) per node because each node has 2
> sockets (CPUs). A socket contains 12 cores, so I've set the number of slots
> per TM as 12.The total parallelism is 384 (12 slots x 2 TMs x 16 nodes).
>
> However, Flink TMs keep failing time to time causing KMeans to fail. The
> only explanation I could find from logs is that TMs unregister from Job
> Manager. I've increased Akka timeout to 1000s as well.
>
> Any suggestions on this?
>
> The data sizes I tried were 10k points, 250k points, and 1mil points. Number
> of centers were 100 to 1000. None of these sizes completed.
>
> Thank you,
> Saliya
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>