streaming join implementation

Posted by Henry Cai on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/streaming-join-implementation-tp6095.html

Hi,

We are evaluating different streaming platforms.  For a typical join between two streams

select a.*, b.*
FROM a, b
ON a.id == b.id

How does flink implement the join?  The matching record from either stream can come late, we consider it's a valid join as long as the event time for record a and b are in the same day.

I think some streaming platform (e.g. google data flow) will store the records from both streams in a K/V lookup store and later do the lookup.  Is this how flink implement the streaming join?

If we need to store all the records in a state store, that's going to be a lots of records for a day.