Broadcast to all the other operators

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Broadcast to all the other operators

Ladhari Sadok
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok
Reply | Threaded
Open this post in threaded view
|

Re: Broadcast to all the other operators

Tony Wei
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok

Reply | Threaded
Open this post in threaded view
|

Re: Broadcast to all the other operators

Ladhari Sadok
Thank you for the answer, I know that solution, but I don't want to stream the rules all time.
In my case I have the rules in Redis and at startup of flink they are loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok

Reply | Threaded
Open this post in threaded view
|

Re: Broadcast to all the other operators

Tony Wei
Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of CoFlatMap and keep them in the operator state, then use Rule Stream to notify the CoFlatMap to 1. update some rules or 2. refetch all rules from Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <[hidden email]>:
Thank you for the answer, I know that solution, but I don't want to stream the rules all time.
In my case I have the rules in Redis and at startup of flink they are loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok


Reply | Threaded
Open this post in threaded view
|

Fwd: Broadcast to all the other operators

Ladhari Sadok

---------- Forwarded message ----------
From: Ladhari Sadok <[hidden email]>
Date: 2017-11-09 10:26 GMT+01:00
Subject: Re: Broadcast to all the other operators
To: Tony Wei <[hidden email]>


Thanks Tony for your very fast answer ,

Yes it resolves my problem that way, but with flatMap I will get Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in case of no rules update available and <newRule,Record> in the other case ).
There is no optimization of this solution ? Do you think it is the same solution in this picture : https://data-artisans.com/wp-content/uploads/2017/10/streaming-in-definitions.png ?

Best regards,
Sadok


Le 9 nov. 2017 9:21 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of CoFlatMap and keep them in the operator state, then use Rule Stream to notify the CoFlatMap to 1. update some rules or 2. refetch all rules from Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <[hidden email]>:
Thank you for the answer, I know that solution, but I don't want to stream the rules all time.
In my case I have the rules in Redis and at startup of flink they are loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok




Reply | Threaded
Open this post in threaded view
|

Re: Broadcast to all the other operators

Tony Wei
Hi Sadok,

The sample code is just an example to show you how to broadcast the rules to all subtasks, but the output from CoFlatMap is not necessary to be Tuple2<Rule, Record>. It depends on what you actually need in your Rule Engine project.
For example, if you can apply rule on each record directly, you can emit processed records to keyed operator.
IMHO, the scenario in the article you mentioned is having serval well-prepared rules to enrich data, and using DSL files to decide what rules that incoming event needs. After enriching, the features for the particular event will be grouped by its random id and be calculated by the models.
I think this approach might be close to the solution in that article, but it could have some difference according to different use cases.

Best Regards,
Tony Wei


2017-11-09 17:27 GMT+08:00 Ladhari Sadok <[hidden email]>:

---------- Forwarded message ----------
From: Ladhari Sadok <[hidden email]>
Date: 2017-11-09 10:26 GMT+01:00
Subject: Re: Broadcast to all the other operators
To: Tony Wei <[hidden email]>


Thanks Tony for your very fast answer ,

Yes it resolves my problem that way, but with flatMap I will get Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in case of no rules update available and <newRule,Record> in the other case ).
There is no optimization of this solution ? Do you think it is the same solution in this picture : https://data-artisans.com/wp-content/uploads/2017/10/streaming-in-definitions.png ?

Best regards,
Sadok


Le 9 nov. 2017 9:21 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of CoFlatMap and keep them in the operator state, then use Rule Stream to notify the CoFlatMap to 1. update some rules or 2. refetch all rules from Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <[hidden email]>:
Thank you for the answer, I know that solution, but I don't want to stream the rules all time.
In my case I have the rules in Redis and at startup of flink they are loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok





Reply | Threaded
Open this post in threaded view
|

Re: Broadcast to all the other operators

Ladhari Sadok
Ok thanks Tony, your answer is very helpful.

2017-11-09 11:09 GMT+01:00 Tony Wei <[hidden email]>:
Hi Sadok,

The sample code is just an example to show you how to broadcast the rules to all subtasks, but the output from CoFlatMap is not necessary to be Tuple2<Rule, Record>. It depends on what you actually need in your Rule Engine project.
For example, if you can apply rule on each record directly, you can emit processed records to keyed operator.
IMHO, the scenario in the article you mentioned is having serval well-prepared rules to enrich data, and using DSL files to decide what rules that incoming event needs. After enriching, the features for the particular event will be grouped by its random id and be calculated by the models.
I think this approach might be close to the solution in that article, but it could have some difference according to different use cases.

Best Regards,
Tony Wei


2017-11-09 17:27 GMT+08:00 Ladhari Sadok <[hidden email]>:

---------- Forwarded message ----------
From: Ladhari Sadok <[hidden email]>
Date: 2017-11-09 10:26 GMT+01:00
Subject: Re: Broadcast to all the other operators
To: Tony Wei <[hidden email]>


Thanks Tony for your very fast answer ,

Yes it resolves my problem that way, but with flatMap I will get Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in case of no rules update available and <newRule,Record> in the other case ).
There is no optimization of this solution ? Do you think it is the same solution in this picture : https://data-artisans.com/wp-content/uploads/2017/10/streaming-in-definitions.png ?

Best regards,
Sadok


Le 9 nov. 2017 9:21 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of CoFlatMap and keep them in the operator state, then use Rule Stream to notify the CoFlatMap to 1. update some rules or 2. refetch all rules from Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <[hidden email]>:
Thank you for the answer, I know that solution, but I don't want to stream the rules all time.
In my case I have the rules in Redis and at startup of flink they are loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <[hidden email]> a écrit :
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule on each record or imply rule on them directly, and do the key operator after that?
If you need to do key operator and apply the rules, it should work by changing the order.

The code might be something like this, and you can change the rules' state in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule = rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <[hidden email]>:
Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data contained in this stream to all keyed operator so I can change the state in all operators.

All article : https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !


Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok