Flink stream processing issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink stream processing issue

Qihua Yang
Hi, 

I have a question. We have two data streams that may contain duplicate data. We are using keyedCoProcessFunction to process stream data. I defined the same keySelector for both streams. Our flink application has multiple replicas. We want the same data to be processed by the same replica. Can flink ensure that?

Thanks,
Qihua
Reply | Threaded
Open this post in threaded view
|

Re: Flink stream processing issue

JING ZHANG
Hi Qihua,

I’m sorry I didn’t understand what you mean by ‘replica’. Would you please explain a little more?
If you means you job has multiple parallelism, and whether same data from different two inputs would be send to the same downstream subtask after `keyedCoProcessFunction`. 
Yes, Flink could do this, if you keyBy the same field for two inputs.

Best regards,
JING ZHANG

Qihua Yang <[hidden email]> 于2021年6月3日周四 下午12:25写道:
Hi, 

I have a question. We have two data streams that may contain duplicate data. We are using keyedCoProcessFunction to process stream data. I defined the same keySelector for both streams. Our flink application has multiple replicas. We want the same data to be processed by the same replica. Can flink ensure that?

Thanks,
Qihua
Reply | Threaded
Open this post in threaded view
|

Re: Flink stream processing issue

Qihua Yang
Sorry for the confusion.... Yes, I mean multiple parallelism. Really thanks for your help.

Thanks,
Qihua

On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG <[hidden email]> wrote:
Hi Qihua,

I’m sorry I didn’t understand what you mean by ‘replica’. Would you please explain a little more?
If you means you job has multiple parallelism, and whether same data from different two inputs would be send to the same downstream subtask after `keyedCoProcessFunction`. 
Yes, Flink could do this, if you keyBy the same field for two inputs.

Best regards,
JING ZHANG

Qihua Yang <[hidden email]> 于2021年6月3日周四 下午12:25写道:
Hi, 

I have a question. We have two data streams that may contain duplicate data. We are using keyedCoProcessFunction to process stream data. I defined the same keySelector for both streams. Our flink application has multiple replicas. We want the same data to be processed by the same replica. Can flink ensure that?

Thanks,
Qihua
Reply | Threaded
Open this post in threaded view
|

Re: Flink stream processing issue

yidan zhao
Yes, if you use KeyedCoProcess, flink will ensure that.

Qihua Yang <[hidden email]> 于2021年6月4日周五 上午12:32写道:

>
> Sorry for the confusion.... Yes, I mean multiple parallelism. Really thanks for your help.
>
> Thanks,
> Qihua
>
> On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG <[hidden email]> wrote:
>>
>> Hi Qihua,
>>
>> I’m sorry I didn’t understand what you mean by ‘replica’. Would you please explain a little more?
>> If you means you job has multiple parallelism, and whether same data from different two inputs would be send to the same downstream subtask after `keyedCoProcessFunction`.
>> Yes, Flink could do this, if you keyBy the same field for two inputs.
>>
>> Best regards,
>> JING ZHANG
>>
>> Qihua Yang <[hidden email]> 于2021年6月3日周四 下午12:25写道:
>>>
>>> Hi,
>>>
>>> I have a question. We have two data streams that may contain duplicate data. We are using keyedCoProcessFunction to process stream data. I defined the same keySelector for both streams. Our flink application has multiple replicas. We want the same data to be processed by the same replica. Can flink ensure that?
>>>
>>> Thanks,
>>> Qihua