http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/How-to-perform-this-join-operation-tp6088p7167.html
Hi Elias!
I think you brought up a couple of good issues. Let me try and summarize what we have so far:
1) Joining in a more flexible fashion
=> The problem you are solving with the trailing / sliding window combination: Is the right way to phrase the join problem "join records where key is equal and timestamps are within X seconds (millis/minutes/...) of each other"?
=> That should definitely have an API abstraction. The first version could me implemented exactly with a combination of sliding and trailing windows.
=> For joins between windowed and non windowed streams in the long run: Aljoscha posted the Design Doc on side inputs. Would that cover the use case as a long-term solution?
2) Lists that are larger than the memory
=> The ListState returns an Iterable, but it is eagerly materialized from RocksDB. Is there a way to "stream" the bytes from RocksDB? Flink could then deserialize them in a streamed fashion as well.
3) Can you elaborate a bit on the OrderedListState? Do you think of multiple values (ordered) per key, or a sequence of key/value pairs, ordered by key?
=> Currently Flink limits the scope of key accesses to the values current key (as defined in the keyBy() function). That way, the system can transparently redistribute keys when changing the parallelism.
Greetings,
Stephan