Hi,
I have usecase when I need to join two kafka topics together by some fields. In general, I could put content of one topic into another, and partition by same key, but I can't touch those two topics(i.e. there are other consumers from those topics), on the other hand it's essential to process same keys at same "thread" to achieve locality and not to get races when working with same key from different machines/threads my idea is to use union of two streams and then key by the field, but is there better approach to achieve "locality"? any inputs will be appreciated Igor |
Hi Igor! What you can actually do is let a single FlinkKafkaConsumer consume from both topics, producing a single DataStream which you can keyBy afterwards. All versions of the FlinkKafkaConsumer support consuming multiple Kafka topics simultaneously. This is logically the same as union and then a keyBy, like what you described. Note that this approach requires that the records in both of your Kafka topics are of the same type when consumed into Flink (ex., same POJO classes, or simply both as Strings, etc.). If that isn’t possible and you have different data types / schemas for the topics, you’d probably need to use “connect” and then a keyBy. If you’re applying a window directly after joining the two topic streams, you could also use a window join:
The “where” specifies how to select the key from the first stream, and “equalTo” the second one. Hope this helps, let me know if you have other questions! Cheers, Gordon On January 9, 2017 at 4:06:34 AM, igor.berman ([hidden email]) wrote:
|
Hi Tzu-Li, Huge thanks for the input, I'll try to implement prototype of your idea and see if it answers my requirements On 9 January 2017 at 08:02, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |