Related datastream

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Related datastream

nragon
Hi,

I have two datastreams, dataStreamA and dataStreamB.
Is there any change to generate a dataStreamC with fields from dataStreamA and dataStreamB?

P.S.: I'm trying to simulate a relational database model and generate data. dataStreamC has foreign key from dataStreamA and dataStreamB

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Related datastream

Jonas Gröger
Hey nragon!

Do the two streams A and B have some sort of id or key or how do you plan on joining them?
Do you just want to join A and B with elements a and b as they arrive (one in state and join with the next arriving one from the other stream)?

From what you are asking, this should be no problem but we need a little bit more clarification here.

-- Jonas
Reply | Threaded
Open this post in threaded view
|

Re: Related datastream

Jonas Gröger
This post was updated on .
In reply to this post by nragon
Hi nragon,

apparently I didn't read the P.S. since I assumed its not important. Silly me.

So you are trying to join stream A and B to stream C with stream A and B being keyed. Alright. How often do matching elements (matched by primary key) from A and B arrive on your operator to-be-implemented?

This question can't be universially answered without having some more constraints on the streams A and B. To me this sounds more like a batch job because it needs to have the whole stream A or B in memory in order to join every element.
Reply | Threaded
Open this post in threaded view
|

Re: Related datastream

nragon
The reason I'm doing it on stream is because i can have many records in memory and I want to execute this in an ordinary laptop. With streaming i can achieve this. So i set my links between a and c with 0..4 meaning each record from a can have between 0 or 4 records, same for b. But for now leta consider each record from a and b originates a record from c.
With dataset api would be easy, but don't know about memory issues.
Reply | Threaded
Open this post in threaded view
|

Re: Related datastream

nragon
I believe I could try with microbatch system in order to release some memory.
Meaning, if I have to generate 1M records splitting in 100m each iteration.