how is data partitoned and distributed for connected stream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how is data partitoned and distributed for connected stream

xie wei
Hello Flink,

assume there are two finite streams, stream1(s1)has only one event, stream2(s2)have 100 events, the parallelism is 2.
Then doing stream1.connect(stream2).map().
How is the data partitioned and distributed to the CoMap instances? Is the event from s1 only available in one of the CoMap instance?
Thank you!

Best regards
Wei


Reply | Threaded
Open this post in threaded view
|

Re: how is data partitoned and distributed for connected stream

Till Rohrmann
Hi,

if all operators have the same parallelism, then there will be a pointwise connection. This means all elements arriving at s1_x and s2_x will be forwarded to s3_x with _x denoting the parallel subtask. Thus, to answer your second question, the single s1 element will only be present at one subtask of the CoMap operator, depending from which s1 parallel subtask it comes.

Cheers,
Till

On Tue, Aug 22, 2017 at 8:31 AM, xie wei <[hidden email]> wrote:
Hello Flink,

assume there are two finite streams, stream1(s1)has only one event, stream2(s2)have 100 events, the parallelism is 2.
Then doing stream1.connect(stream2).map().
How is the data partitioned and distributed to the CoMap instances? Is the event from s1 only available in one of the CoMap instance?
Thank you!

Best regards
Wei