Hi,
I have streaming job which is running with parallelism 1 as of now . (This job will run with parallelism > 1 in future ) So I have added custom partitioner to partition the data based on one tuple field . The flow is : source -> map -> partitioner -> flatmap -> sink The partitioner is adding around 40 sec or more delay for 40% of data (from map to flatmap ). I am not able to understand if parallelism is one how this delay is getting added . |
Hi,
Before adding the partitioner all your functions are chained together. That is, everything is executed in one Thread and sending elements from one function to the next is essentially just a method call. By introducing a partitioner you break this chain and therefore your job now has to send data (at least) across between different Threads. I think you should see this if you look at the execution graph. Best, Aljoscha > On 15. Jun 2017, at 14:35, sohimankotia <[hidden email]> wrote: > > Hi, > > I have streaming job which is running with parallelism 1 as of now . (This > job will run with parallelism > 1 in future ) > > So I have added custom partitioner to partition the data based on one tuple > field . > > The flow is : > > source -> map -> partitioner -> flatmap -> sink > > The partitioner is adding around 40 sec or more delay for 40% of data (from > map to flatmap ). > > I am not able to understand if parallelism is one how this delay is getting > added . > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
This post was updated on .
You are right Aljoscha . Jog graph is splitted after introducing partitioner .
I was under impression that If parallelism is set to 1 everything will be chained together . (Just clarification) Can you explain how data will flow for map -> partitioner -> flatmap if parallelism or It would be great if point me to right documentation ? Can is this possible it can add more than 100 seconds delay if partitioner is added ? Sorry for dumb question but I did not find any detailed documentation on flink website . Most of links https://cwiki.apache.org/confluence/display/FLINK/Parallelism+and+Scheduling are also empty as of now . |
Hi,
These two documentation pages might be interesting: Flink will chain operations together whenever this is possible and the basic prerequisites for this are: the parallelism is the same and the connection pattern is “forwarding”, i.e. there is no broadcast, shuffle, or custom partitioning scheme. Best, Aljoscha
|
Thanks for pointers Aljoscha.
I was just wondering, Since Custom partition will run in separate thread . Is it possible that from map -> custom partition -> flat map can take more than 200 seconds if parallelism is still 1 . |
Hi,
That depends, how are you measuring and what are your results? Best, Aljoscha > On 19. Jun 2017, at 06:23, sohimankotia <[hidden email]> wrote: > > Thanks for pointers Aljoscha. > > I was just wondering, Since Custom partition will run in separate thread . > Is it possible that from map -> custom partition -> flat map can take more > than 200 seconds if parallelism is still 1 . > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766p13822.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |