MapWithState for two keyed stream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

MapWithState for two keyed stream

Peter Zende
Hi all,

Is it possible to define two DataStream sources - one which reads from Kafka, the other reads from HDFS -  and apply mapWithState with CoFlatMapFunction? The idea would be to read historical data from HDFS along with the live stream from Kafka and based on some business  write the output to the sink in correct update order?

Or is it easier to just union those two streams? In mapWithState we should tell from which stream the record originates from to be able to correctly build up the state store.

Many thanks.
Peter
Reply | Threaded
Open this post in threaded view
|

Re: MapWithState for two keyed stream

Michael Latta
CoRichFlatMap or union will work.  If you need to know which is historical the flatmap will be better as you can tell which stream it cam from.  But, be careful about reading historical data and trying to process it all before processing the new data.  That can lead to buffering a lot of incoming data while processing the historical data.

Michael

> On May 9, 2018, at 9:49 AM, Peter Zende <[hidden email]> wrote:
>
> Hi all,
>
> Is it possible to define two DataStream sources - one which reads from Kafka, the other reads from HDFS -  and apply mapWithState with CoFlatMapFunction? The idea would be to read historical data from HDFS along with the live stream from Kafka and based on some business  write the output to the sink in correct update order?
>
> Or is it easier to just union those two streams? In mapWithState we should tell from which stream the record originates from to be able to correctly build up the state store.
>
> Many thanks.
> Peter