Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

Dulce Morim
Hello,

Following this exercise:
http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html

I need to do something similar, but my data structure is something like:

A
Primary_key
other fields

B
Primary_key
Relation_Key
other fields

Where A and B relationship is one to more, on B.Relation_key = A.Primary_key

When using the keyby function on both streams, with the key "A.Primary_key" on the A stream and the "B.Relation_key" on the B stream, the data that comes from B, only shows the last occurrence of the records that had the same "B.Relation_key".

Is it possible to connect these two streams? In this solution there seems to be a 1 to 1 relationship, but we want a one to many relationship. Should this be solved via another process?

Thanks,
Dulce Morim
Reply | Threaded
Open this post in threaded view
|

Re: Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

Chesnay Schepler
You can still connect the streams but it will be more complex than the
reference solution.

You will have to store the events from B in a ListState instead.
If an A arrives, store it in the value state, emit a tuple (A, B_x) for
every stored B, and clear B.
 From that point on, emit a new tuple (A, B) for every B that arrives
and ignore the B state.

On 26.03.2018 17:18, Dulce Morim wrote:

> Hello,
>
> Following this exercise:
> http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html
>
> I need to do something similar, but my data structure is something like:
>
> A
> Primary_key
> other fields
>
> B
> Primary_key
> Relation_Key
> other fields
>
> Where A and B relationship is one to more, on B.Relation_key = A.Primary_key
>
> When using the keyby function on both streams, with the key "A.Primary_key" on the A stream and the "B.Relation_key" on the B stream, the data that comes from B, only shows the last occurrence of the records that had the same "B.Relation_key".
>
> Is it possible to connect these two streams? In this solution there seems to be a 1 to 1 relationship, but we want a one to many relationship. Should this be solved via another process?
>
> Thanks,
> Dulce Morim