Hello, I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.It is like this use case : All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flinkImage : https://data-artisans.com/wp-content/uploads/2017/10/streaming-in-definitions.png I founded it in the DataSet API but not in the DataStream API ! https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#broadcast-variables Can some one explain to me who to solve this problem ? Thanks a lot. Flinkly regards, Sadok |
Hi Sadok, Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream. How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that? If you need to do key operator and apply the rules, it should work by changing the order. The code might be something like this, and you can change the rules' state in the CoFlatMapFunction. DataStream<Rule> rules = ...; DataStream<Record> records = ...; DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...); dataWithRule.keyBy(...).process(...); Hope this will make sense to you. Best Regards, Tony Wei 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
|
Thank you for the answer, I know that solution, but I don't want to stream the rules all time. In my case I have the rules in Redis and at startup of flink they are loaded. I want to broadcast changes just when it occurs. Thanks. Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
|
Hi Sadok, What I mean is to keep the rules in the operator state. The event in Rule Stream is just the change log about rules. For more specific, you can fetch the rules from Redis in the open step of CoFlatMap and keep them in the operator state, then use Rule Stream to notify the CoFlatMap to 1. update some rules or 2. refetch all rules from Redis. Is that what you want? Best Regards, Tony Wei 2017-11-09 15:52 GMT+08:00 Ladhari Sadok <[hidden email]>:
|
---------- Forwarded message ---------- From: Ladhari Sadok <[hidden email]> Date: 2017-11-09 10:26 GMT+01:00 Subject: Re: Broadcast to all the other operators To: Tony Wei <[hidden email]> Thanks Tony for your very fast answer , Yes it resolves my problem that way, but with flatMap I will get Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in case of no rules update available and <newRule,Record> in the other case ). There is no optimization of this solution ? Do you think it is the same solution in this picture : https://data-artisans.com/wp-c Best regards, Sadok Le 9 nov. 2017 9:21 AM, "Tony Wei" <[hidden email]> a écrit :
|
Hi Sadok, The sample code is just an example to show you how to broadcast the rules to all subtasks, but the output from CoFlatMap is not necessary to be Tuple2<Rule, Record>. It depends on what you actually need in your Rule Engine project. For example, if you can apply rule on each record directly, you can emit processed records to keyed operator. IMHO, the scenario in the article you mentioned is having serval well-prepared rules to enrich data, and using DSL files to decide what rules that incoming event needs. After enriching, the features for the particular event will be grouped by its random id and be calculated by the models. I think this approach might be close to the solution in that article, but it could have some difference according to different use cases. Best Regards, Tony Wei 2017-11-09 17:27 GMT+08:00 Ladhari Sadok <[hidden email]>:
|
Ok thanks Tony, your answer is very helpful. 2017-11-09 11:09 GMT+01:00 Tony Wei <[hidden email]>:
|
Free forum by Nabble | Edit this page |