(DEPRECATED) Apache Flink User Mailing List archive.

Increasing parallelism skews/increases overall job processing time linearly

Classic

List

Threaded

9 messages Options

Chakravarthy varaga

Increasing parallelism skews/increases overall job processing time linearly

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.

What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards

CVP

plan.jpg (15K) Download Attachment

Chakravarthy varaga

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi All,

Any updates on this?

Best Regards

CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Chen Qin

Re: Increasing parallelism skews/increases overall job processing time linearly

Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Stephan Ewen

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:

- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?

- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.

- Are you using the standard Kafka Source (which Kafka version)?

- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:

Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Chakravarthy varaga

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi Stephen

. Kafka version is: 0.9.0.1 the connector is flinkconsumer09

. The flatmap n coflatmap are connected by keyBy

. No data is broadcasted and the data is not exploded based on the parallelism

Cvp

On 6 Jan 2017 20:16, "Stephan Ewen" <[hidden email]> wrote:

Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:
- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?
- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.
- Are you using the standard Kafka Source (which Kafka version)?
- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:
Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Chakravarthy varaga

Re: Increasing parallelism skews/increases overall job processing time linearly

Anything that I could check or collect for you for investigation ?

On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi Stephen

. Kafka version is: 0.9.0.1 the connector is flinkconsumer09
. The flatmap n coflatmap are connected by keyBy
. No data is broadcasted and the data is not exploded based on the parallelism

Cvp

On 6 Jan 2017 20:16, "Stephan Ewen" <[hidden email]> wrote:
Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:
- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?
- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.
- Are you using the standard Kafka Source (which Kafka version)?
- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:
Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Chakravarthy varaga

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi Guys,

I understand that you are extremely busy but any pointers here is highly appreciated. I can proceed forward towards concluding the activity !

Best Regards

CVP

On Mon, Jan 9, 2017 at 11:43 AM, Chakravarthy varaga <[hidden email]> wrote:

Anything that I could check or collect for you for investigation ?

On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga <[hidden email]> wrote:
Hi Stephen

. Kafka version is: 0.9.0.1 the connector is flinkconsumer09
. The flatmap n coflatmap are connected by keyBy
. No data is broadcasted and the data is not exploded based on the parallelism

Cvp

On 6 Jan 2017 20:16, "Stephan Ewen" <[hidden email]> wrote:
Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:
- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?
- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.
- Are you using the standard Kafka Source (which Kafka version)?
- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:
Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Till Rohrmann

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi CVP,

changing the parallelism from 1 to 2 with every TM having only one slot will inevitably introduce another network shuffle operation between the sources and the keyed co flat map. This might be the source of your slow down, because before everything was running on one machine without any network communication (apart from reading from Kafka).

Do you also observe a further degradation when increasing the parallelism from 2 to 4, for example (given that you've increased the number of topic partitions to at least the maximum parallelism in your topology)?

Cheers,

Till

On Tue, Jan 10, 2017 at 11:37 AM, Chakravarthy varaga <[hidden email]> wrote:

Hi Guys,

I understand that you are extremely busy but any pointers here is highly appreciated. I can proceed forward towards concluding the activity !

Best Regards
CVP

On Mon, Jan 9, 2017 at 11:43 AM, Chakravarthy varaga <[hidden email]> wrote:
Anything that I could check or collect for you for investigation ?

On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga <[hidden email]> wrote:
Hi Stephen

. Kafka version is: 0.9.0.1 the connector is flinkconsumer09
. The flatmap n coflatmap are connected by keyBy
. No data is broadcasted and the data is not exploded based on the parallelism

Cvp

On 6 Jan 2017 20:16, "Stephan Ewen" <[hidden email]> wrote:
Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:
- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?
- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.
- Are you using the standard Kafka Source (which Kafka version)?
- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:
Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP

Chakravarthy varaga

Re: Increasing parallelism skews/increases overall job processing time linearly

Hi Tim,

Thanks for your response.
The results are the same.

4 CPU (*8 cores in total)

kafka partitions = 4 per topic

parallesim for job = 3

task.slot / TM = 4

Basically this flink application consumes (kafka source) from 2 topics and produces (kafka sink) onto 1 topic. on 1 consumer topic, the event load is 100K/sec, while the other source has 1 event / an hour ...

I'm wondering if parallelism is enabled on multiple sources irrespective of the partition size.

What I did is to enable 1 partition for the 2nd topic (1 event/hour) and 4 partitions for 100K events topic. And deployed a 3 parallelism job and the results are the same...

Best Regards

CVP

On Wed, Jan 11, 2017 at 1:11 PM, Till Rohrmann <[hidden email]> wrote:

Hi CVP,

changing the parallelism from 1 to 2 with every TM having only one slot will inevitably introduce another network shuffle operation between the sources and the keyed co flat map. This might be the source of your slow down, because before everything was running on one machine without any network communication (apart from reading from Kafka).

Do you also observe a further degradation when increasing the parallelism from 2 to 4, for example (given that you've increased the number of topic partitions to at least the maximum parallelism in your topology)?

Cheers,
Till

On Tue, Jan 10, 2017 at 11:37 AM, Chakravarthy varaga <[hidden email]> wrote:
Hi Guys,

I understand that you are extremely busy but any pointers here is highly appreciated. I can proceed forward towards concluding the activity !

Best Regards
CVP

On Mon, Jan 9, 2017 at 11:43 AM, Chakravarthy varaga <[hidden email]> wrote:
Anything that I could check or collect for you for investigation ?

On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga <[hidden email]> wrote:
Hi Stephen

. Kafka version is: 0.9.0.1 the connector is flinkconsumer09
. The flatmap n coflatmap are connected by keyBy
. No data is broadcasted and the data is not exploded based on the parallelism

Cvp

On 6 Jan 2017 20:16, "Stephan Ewen" <[hidden email]> wrote:
Hi!

You are right, parallelism 2 should be faster than parallelism 1 ;-) As ChenQin pointed out, having only 2 Kafka Partitions may prevent further scaleout.

Few things to check:
- How are you connecting the FlatMap and CoFlatMap? Default, keyBy, broadcast?
- Broadcast for example would multiply the data based on parallelism, can lead to slowdown when saturating the network.
- Are you using the standard Kafka Source (which Kafka version)?
- Is there any part in the program that multiplies data/effort with higher parallelism (does the FlatMap explode data based on parallelism)?

Stephan

On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <[hidden email]> wrote:
Just noticed there are only two partitions per topic. Regardless of how large parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

On Jan 6, 2017, at 02:40, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

Any updates on this?

Best Regards
CVP

On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions each.
What I notice is that when I deploy a job to run with #parallelism=2 the total processing time doubles the time it took when the same job was deployed with #parallelism=1. It linearly increases with the parallelism.

Since the numberof slots is set to 1 per TM, I would assume that the job would be processed in parallel in 2 different TMs and that each consumer in each TM is connected to 1 partition of the topic. This therefore should have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS). I think this is distributed among the TMs. My understanding is that the search of values state could be costly between TMs. Do you sense something wrong here?

Best Regards
CVP