Effect of increasing parallelism on throughput

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Effect of increasing parallelism on throughput

HarshithBolar

Hi all,

 

I ran a job first with Parallelism 1 and then with Parallelism 3. With Parallelism=1, the kafka source was reading records at rate ~500 records per second. With Parallelism=3, the throughput got divided among the three parallelisms, each reading approximately ~150 records per second. Note that the source is publishing records at a much higher rate (~1000 records per second).

Is this expected? I would imagine the throughput to increase with parallelism, but it is remaining the same. I checked the Backpressure status on the source, it was High.

Screenshots for reference:

Parallelism 1:

Parallelism 3:

 

Thank you,
Harshith

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Effect of increasing parallelism on throughput

Zhijiang(wangzhijiang999)
Hi Harshith,

I guess the throughput is limited by the lowest vertex which causes backpressure in topology. That means the downstream task could only consume that rate which distributes fairly in all the upstream source tasks. The higher source task would be blocked to produce more records in backpressure. In the non-backpressure mode I think throughput might be increased with parallelism.

Best,
Zhijiang
------------------------------------------------------------------
From:Kumar Bolar, Harshith <[hidden email]>
Send Time:2019年6月15日(星期六) 00:20
To:user <[hidden email]>
Subject:Effect of increasing parallelism on throughput

Hi all,

 

I ran a job first with Parallelism 1 and then with Parallelism 3. With Parallelism=1, the kafka source was reading records at rate ~500 records per second. With Parallelism=3, the throughput got divided among the three parallelisms, each reading approximately ~150 records per second. Note that the source is publishing records at a much higher rate (~1000 records per second).

Is this expected? I would imagine the throughput to increase with parallelism, but it is remaining the same. I checked the Backpressure status on the source, it was High.

Screenshots for reference:

Parallelism 1:

Parallelism 3:

 

Thank you,
Harshith