Using Custom Partitioner in Streaming with parallelism 1 adding latency

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Custom Partitioner in Streaming with parallelism 1 adding latency

sohimankotia
Hi,

I have streaming job which is running with parallelism 1 as of now .  (This job will run with parallelism > 1 in future )

So I have added custom partitioner to partition the data based on one tuple field .

The flow is  :

source -> map -> partitioner -> flatmap -> sink

The partitioner is adding around 40 sec or more delay for 40% of data (from map to flatmap ).

I am not able to understand if parallelism is one how this delay is getting added .
Reply | Threaded
Open this post in threaded view
|

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

Aljoscha Krettek
Hi,
Before adding the partitioner all your functions are chained together. That is, everything is executed in one Thread and sending elements from one function to the next is essentially just a method call. By introducing a partitioner you break this chain and therefore your job now has to send data (at least) across between different Threads. I think you should see this if you look at the execution graph.

Best,
Aljoscha

> On 15. Jun 2017, at 14:35, sohimankotia <[hidden email]> wrote:
>
> Hi,
>
> I have streaming job which is running with parallelism 1 as of now .  (This
> job will run with parallelism > 1 in future )
>
> So I have added custom partitioner to partition the data based on one tuple
> field .
>
> The flow is  :
>
> source -> map -> partitioner -> flatmap -> sink
>
> The partitioner is adding around 40 sec or more delay for 40% of data (from
> map to flatmap ).
>
> I am not able to understand if parallelism is one how this delay is getting
> added .
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

sohimankotia
This post was updated on .
You are right Aljoscha . Jog graph is splitted after introducing partitioner .

I was under impression that If parallelism is set to 1 everything will be chained together . (Just clarification)
Can you explain how data will flow for  map -> partitioner -> flatmap  if parallelism or It would be great if point me to right documentation ?

Can is this possible it can add more than 100 seconds delay if partitioner is added ?

Sorry for dumb question but I did not find any detailed documentation on flink website . Most of links https://cwiki.apache.org/confluence/display/FLINK/Parallelism+and+Scheduling are also empty as of now .

Reply | Threaded
Open this post in threaded view
|

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

Aljoscha Krettek
Hi,

These two documentation pages might be interesting:

Flink will chain operations together whenever this is possible and the basic prerequisites for this are: the parallelism is the same and the connection pattern is “forwarding”, i.e. there is no broadcast, shuffle, or custom partitioning scheme.

Best,
Aljoscha 
On 16. Jun 2017, at 06:42, sohimankotia <[hidden email]> wrote:

You are right Aljoscha . Jog graph is splitted after introducing partitioner
.

I was under impression that If parallelism is set everything will be chained
together .
Can you explain how data will flow for  map -> partitioner -> flatmap  if
parallelism or It would be great if point me to right documentation ?

Can is this possible it can add more than 100 seconds delay if partitioner
is added ?

Sorry for dumb question but I did not find any detailed documentation on
flink website . Most of links
https://cwiki.apache.org/confluence/display/FLINK/Parallelism+and+Scheduling
are also empty as of now .





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766p13774.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

sohimankotia
Thanks for pointers Aljoscha.

I was just wondering, Since Custom partition will run in separate thread . Is it possible that from map -> custom partition -> flat map can take more than 200 seconds if parallelism is still 1 .
Reply | Threaded
Open this post in threaded view
|

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

Aljoscha Krettek
Hi,

That depends, how are you measuring and what are your results?

Best,
Aljoscha

> On 19. Jun 2017, at 06:23, sohimankotia <[hidden email]> wrote:
>
> Thanks for pointers Aljoscha.
>
> I was just wondering, Since Custom partition will run in separate thread .
> Is it possible that from map -> custom partition -> flat map can take more
> than 200 seconds if parallelism is still 1 .
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766p13822.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.