(DEPRECATED) Apache Flink User Mailing List archive.

Re: same parallelism with different taskmanager and slots, skew occurs

Posted by Till Rohrmann on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/same-parallelism-with-different-taskmanager-and-slots-skew-occurs-tp25281p25339.html

Hi,

could you tell me how exactly you started the cluster and with which parameters (configured memory, maybe vcores, etc.)?

Cheers,
Till

On Thu, Jan 3, 2019 at 2:37 AM varuy322 <[hidden email]> wrote:

Hi, Till
It's very kind of your reply. I got your point, I'm sorry to not make it
clear about my issue.
I generated data by streaming benchmark just as the link:
https://github.com/dataArtisans/databricks-benchmark/blob/master/src/main/scala/com/databricks/benchmark/flink/EventGenerator.scala
.

What I wanna to say is that, let the parallelism is same assume to 96, just
changes the tm and slots/tm. The first test to configure tm 3 with 32
slots/tm, there does not occur data skew, three machine receive same data
and each partition processed approximate data. Then second test to configure
tm 6 with 16 slots/tm, I find each partition processed same data too, but
one machine processed data more than the other two machine.

I wonder whether the taskmanager(jvm) competes in one machine? What's more,
how does the streaming benchmark do with backpressure? I test on cluster
with 4 node, one for master and three for worker, each node with Intel Xeon
E5-2699 v4 @ 2.20GHz/3.60GHz, 256G memory, 88 cores, 10Gbps network, I could
not find the bottleneck. It confused me!

Best Regards & Thanks

Rui

-----
stay hungry, stay foolish.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/