How to join/group 2 streams by key?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to join/group 2 streams by key?

John Tipper

Hi All,


I have 2 streams of events that relate to a common base event, where one stream is the result of a flatmap. I want to join all events that share a common identifier.

Thus I have something that looks like:

DataStream<TypeA> streamA = ...
DataStream<TypeB> streamB = someDataStream.flatMap(...) // produces stream of TypeB for each item in someDataStream

Both TypeA and TypeB share an identifier and I know how many TypeB objects there are in the parent object. I want to perform some processing when all of the events associated with a particular identifier have arrived, i.e. when I basically can create a Tuple3<id, TypeA, List<TypeB>> object.

Is this best done with a WindowJoin and a GlobalWindow, a Window CoGroup and a GlobalWindow or by connecting the 2 streams into a ConnectedStream then performing the joining inside a CoProcessFunction?


Many thanks,

John
Reply | Threaded
Open this post in threaded view
|

Re: How to join/group 2 streams by key?

Congxian Qiu
Hi John
I've seen other people have the same problem to solve,  the following is their solution:
union the two Datastreams, then use ProcsssFunction[1] to solve this, will also register timers to do GC things.


John Tipper <[hidden email]> 于2019年6月14日周五 下午3:24写道:

Hi All,


I have 2 streams of events that relate to a common base event, where one stream is the result of a flatmap. I want to join all events that share a common identifier.

Thus I have something that looks like:

DataStream<TypeA> streamA = ...
DataStream<TypeB> streamB = someDataStream.flatMap(...) // produces stream of TypeB for each item in someDataStream

Both TypeA and TypeB share an identifier and I know how many TypeB objects there are in the parent object. I want to perform some processing when all of the events associated with a particular identifier have arrived, i.e. when I basically can create a Tuple3<id, TypeA, List<TypeB>> object.

Is this best done with a WindowJoin and a GlobalWindow, a Window CoGroup and a GlobalWindow or by connecting the 2 streams into a ConnectedStream then performing the joining inside a CoProcessFunction?


Many thanks,

John