Hi All I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB). For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not. An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by user id and then do a look up for each event in the stream into this DataSet. Since Flink does not allow using DataSets in a streaming program, how can I achieve the above ? Another option could be to use Managed Operator State to store buyers set, but how can I keep this state distributed by user id so as to avoid network i/o in individual event look ups ? In case of memory state backend, does state remain distributed by some key, or is it replicated across all operator subtasks ? What is the right design pattern to achieve the above enriching requirement in a Flink streaming program ? Thanks Vijay Kansal Software Development Engineer |
Hi, This type of applications are not super well supported by Flink, yet. The missing feature is on the roadmap and called Side Inputs [1].2018-04-04 9:41 GMT+02:00 vijay kansal <[hidden email]>:
|
Free forum by Nabble | Edit this page |