Join two streams from Kafka

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Join two streams from Kafka

Shamit
Hello Flink Users,

I am newbie and have question on join of two streams (stream1 and stream2 )
from Kafka topic based on some key.

In my use case I need to join with stream2 data which might be year old and
more.

Now if on stream1 the data gets arrived today and I need to join with
stream2 based on some key Please let me know how efficiently I can do.

stream2 might have lots of records(in millions).

Please help.

Regards,
Shamit Jain



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Join two streams from Kafka

Arvid Heise-4
Hi Shamit,

unless you have some temporal relationship between the records to be joined, you have to use a regular join over stream 1 and stream 2.
Since you cannot define any window, all data will be held in Flink's state, which is not an issue for a few millions but probably means you have to use rocksdb statebackend [1] or else you may run out of main memory.

I recommend using Flink SQL or Table API, which will also prune all unnecessary columns from your data. If you want to use DataStream API instead, I recommend to drop all unrelated columns prior to the join.


On Tue, Feb 9, 2021 at 10:47 PM Shamit <[hidden email]> wrote:
Hello Flink Users,

I am newbie and have question on join of two streams (stream1 and stream2 )
from Kafka topic based on some key.

In my use case I need to join with stream2 data which might be year old and
more.

Now if on stream1 the data gets arrived today and I need to join with
stream2 based on some key Please let me know how efficiently I can do.

stream2 might have lots of records(in millions).

Please help.

Regards,
Shamit Jain



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/