Question on stream joins

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on stream joins

Sudan S
Hi ,

I have two usecases

1. I have two streams which `leftSource` and `rightSource` which i want to join without partitioning over a window and find the difference of count of elements of leftSource and rightSource and emit the result of difference. Which is the appropriate join function ican use ?

join/cogroup/connect.

2. I want to replicate the same behaviour over a keyed source. Basically leftSource and rightSource are joined by a partition key.

Plz let me know which is the appropriate join operator for the usecase


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."
Reply | Threaded
Open this post in threaded view
|

Re: Question on stream joins

Yun Gao
Hi Sudan,

   As far as I know, both join and cogroup requires keys (namely partitioning), thus for the non-keyed scenario, you may have to use low-level connect operator to achieve it. In my opinion it should be something like

  leftSource.connect(rightSource)
       .process(new TagCoprocessFunction()) // In this function,  tag the left source with "0" and the right source with "1"
    ​  .window(xx) 
    ​  .process(new XX()) // In this function, you could get all the left and right elements in this window, and you could distinguish them with the tag added in the previous step.

It should be pointed out that without key (partitioning) the paralellism of the window operator will have to be 1.


For the keyed scenarios, You may use high-level operators join/cogroup to achieve that. The join could be seen as a special example as cogroup that in cogroup, you could access all the left and right elements directly, and in join function, the framework will iterate the elements for you and you can only specify the logic for each (left, right) pair. 

Best,
 Yun

------------------Original Mail ------------------
Sender:Sudan S <[hidden email]>
Send Date:Fri May 29 01:40:59 2020
Recipients:User-Flink <[hidden email]>
Subject:Question on stream joins
Hi ,

I have two usecases

1. I have two streams which `leftSource` and `rightSource` which i want to join without partitioning over a window and find the difference of count of elements of leftSource and rightSource and emit the result of difference. Which is the appropriate join function ican use ?

join/cogroup/connect.

2. I want to replicate the same behaviour over a keyed source. Basically leftSource and rightSource are joined by a partition key.

Plz let me know which is the appropriate join operator for the usecase


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."
Reply | Threaded
Open this post in threaded view
|

Re: Question on stream joins

Sudan S
Thanks Yun. Was thinking a similar way.  I had one more question.

leftSource.connect(rightSource)
       .process(new TagCoprocessFunction()) // In this function,  tag the left source with "0" and the right source with "1"
      .window(xx) 
      .process(new XX()) 

In this when will the window be applied ? since the window operator is after process(new TagCoprocessFunction()).

On Fri, May 29, 2020 at 11:35 AM Yun Gao <[hidden email]> wrote:
Hi Sudan,

   As far as I know, both join and cogroup requires keys (namely partitioning), thus for the non-keyed scenario, you may have to use low-level connect operator to achieve it. In my opinion it should be something like

  leftSource.connect(rightSource)
       .process(new TagCoprocessFunction()) // In this function,  tag the left source with "0" and the right source with "1"
    ​  .window(xx) 
    ​  .process(new XX()) // In this function, you could get all the left and right elements in this window, and you could distinguish them with the tag added in the previous step.

It should be pointed out that without key (partitioning) the paralellism of the window operator will have to be 1.


For the keyed scenarios, You may use high-level operators join/cogroup to achieve that. The join could be seen as a special example as cogroup that in cogroup, you could access all the left and right elements directly, and in join function, the framework will iterate the elements for you and you can only specify the logic for each (left, right) pair. 

Best,
 Yun

------------------Original Mail ------------------
Sender:Sudan S <[hidden email]>
Send Date:Fri May 29 01:40:59 2020
Recipients:User-Flink <[hidden email]>
Subject:Question on stream joins
Hi ,

I have two usecases

1. I have two streams which `leftSource` and `rightSource` which i want to join without partitioning over a window and find the difference of count of elements of leftSource and rightSource and emit the result of difference. Which is the appropriate join function ican use ?

join/cogroup/connect.

2. I want to replicate the same behaviour over a keyed source. Basically leftSource and rightSource are joined by a partition key.

Plz let me know which is the appropriate join operator for the usecase


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."
Reply | Threaded
Open this post in threaded view
|

Re: Re: Question on stream joins

Yun Gao
Hi Sudan,

   The first process is used to tag the elements from the left and right windows, so next they could be merged into the same stream and then they could be assigned to the same window. Then the next window(xxx).process(new WindowProcessFunction) defines the window operator to process the windowed elements, thus the second process defines the window process logic. Without the tagging we may not be able to assign the elements from both the left and right stream to the same window.

Best,
 Yun


------------------Original Mail ------------------
Sender:Sudan S <[hidden email]>
Send Date:Fri May 29 14:39:31 2020
Recipients:Yun Gao <[hidden email]>
CC:User-Flink <[hidden email]>
Subject:Re: Question on stream joins
Thanks Yun. Was thinking a similar way.  I had one more question.

leftSource.connect(rightSource)
       .process(new TagCoprocessFunction()) // In this function,  tag the left source with "0" and the right source with "1"
      .window(xx) 
      .process(new XX()) 

In this when will the window be applied ? since the window operator is after process(new TagCoprocessFunction()).

On Fri, May 29, 2020 at 11:35 AM Yun Gao <[hidden email]> wrote:
Hi Sudan,

   As far as I know, both join and cogroup requires keys (namely partitioning), thus for the non-keyed scenario, you may have to use low-level connect operator to achieve it. In my opinion it should be something like

  leftSource.connect(rightSource)
       .process(new TagCoprocessFunction()) // In this function,  tag the left source with "0" and the right source with "1"
    ​  .window(xx) 
    ​  .process(new XX()) // In this function, you could get all the left and right elements in this window, and you could distinguish them with the tag added in the previous step.

It should be pointed out that without key (partitioning) the paralellism of the window operator will have to be 1.


For the keyed scenarios, You may use high-level operators join/cogroup to achieve that. The join could be seen as a special example as cogroup that in cogroup, you could access all the left and right elements directly, and in join function, the framework will iterate the elements for you and you can only specify the logic for each (left, right) pair. 

Best,
 Yun

------------------Original Mail ------------------
Sender:Sudan S <[hidden email]>
Send Date:Fri May 29 01:40:59 2020
Recipients:User-Flink <[hidden email]>
Subject:Question on stream joins
Hi ,

I have two usecases

1. I have two streams which `leftSource` and `rightSource` which i want to join without partitioning over a window and find the difference of count of elements of leftSource and rightSource and emit the result of difference. Which is the appropriate join function ican use ?

join/cogroup/connect.

2. I want to replicate the same behaviour over a keyed source. Basically leftSource and rightSource are joined by a partition key.

Plz let me know which is the appropriate join operator for the usecase


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."


"The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by replying to this e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business."