Hi there,
I have a use case to check for active ID, there are two streams and I connect them: one has actual data (Stream A) and the other one is for lookup purpose (Stream B), I am getting Stream B as a file which includes all active ID, so inactive ID would not be show up on this list. I tried to use watermark to clean up the state of inactivate ID, but the Stream B updates is unpredictable so I want to keep everything in state until I found the item is not in that file any more. Please suggest what is the best way to implement it in flink. Thanks in advance for your help. Regards, Chengzhi |
Hi Chengzhi,
you said the Stream B which comes from a file will be updated unpredictably. I wonder if you could share more about how to judge an item (from Stream A I suppose) is not in the file and what watermark generation strategy did you choose? Best, Xingcan > On May 12, 2018, at 12:48 AM, Chengzhi Zhao <[hidden email]> wrote: > > Hi there, > > I have a use case to check for active ID, there are two streams and I connect them: one has actual data (Stream A) and the other one is for lookup purpose (Stream B), I am getting Stream B as a file which includes all active ID, so inactive ID would not be show up on this list. I tried to use watermark to clean up the state of inactivate ID, but the Stream B updates is unpredictable so I want to keep everything in state until I found the item is not in that file any more. > > Please suggest what is the best way to implement it in flink. Thanks in advance for your help. > > Regards, > Chengzhi > > |
Hi Xingcan, Thanks for your response, to give your more background about my use case, I have Stream B with some split test name, and Stream A will be the actual test. I want to have Stream A connect to Stream B to figure out whether this test is still active or not. I am not sure this is the right way to do: My watermark is based on event time for 15 mins, OnTimer will be emit that records after 15 mins. I was wondering if there is way to purge the state of entire Stream B so I can get all the active test, since the file will include all the updated split testing name so I can refresh the lookup. Also, I am not sure if I am using the right operator here, or if there is a way to share variable globally so I can just perform filter on stream A. Please let me know your thoughts and thanks for you suggestions again. Regards, Chengzhi On Sat, May 12, 2018 at 8:55 PM, Xingcan Cui <[hidden email]> wrote: Hi Chengzhi, |
Hi Chengzhi, currently, it's impossible to process both a stream and a (dynamically updated) dataset in a single job. I'll provide you with some workarounds, all of which are based on that the file for active test names is not so large. (1) You may define your own stream source[1] which should be aware of the file update, and keep the input file as a stream (the Stream B as you described). Some special records can be inserted to indicate the start and end of an update. Note that instead of using the `keyBy()` method, the Stream B should be broadcasted, while the Stream A can be partitioned arbitrarily. With this method, you can clean and rebuild the states according to the start/end indicators. (2) You may also take the file of active test names as external states and set processing time timers[2] to update them regularly (e.g., with 1 min interval) in a ProcessFunction[3]. IMO, the watermark may not work as expected for your use case. Besides, since the file will be updated unpredictably, it's hard to guarantee the precision of results. Hope that helps, Xingcan
|
Hi Xingcan, Thanks a lot for providing your inputs on the possible solutions here. Can you please clarify on how to broadcasted in Flink? Appreciate your help!! Best, Chengzhi On Tue, May 15, 2018 at 10:22 AM, Xingcan Cui <[hidden email]> wrote:
|
Hi Chengzhi,
more details about partitioning mechanisms can be found at https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#physical-partitioning. Best, Xingcan
|
Thanks again Xingcan! Appreciate your help! On Tue, May 15, 2018, 9:31 PM Xingcan Cui <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |