How to use 'dynamic' state

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use 'dynamic' state

Steve Jerman
I’ve been reading the code/user goup/SO and haven’t really found a great answer to this… so I thought I’d ask.

I have a UI that allows the user to edit rules which include specific criteria for example trigger event if X many people present for over a minute.

I would like to have a flink job that processes an event stream and triggers on these rules.

The catch is that I don’t want to have to restart the job if the rules change… (it would be easy otherwise :))

So I found four ways to proceed:

* API based stop and restart of job … ugly.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available?

* Implement my own operators… seems complicated ;)

Are there other approaches?

Thanks for any advice
Steve
Reply | Threaded
Open this post in threaded view
|

Re: How to use 'dynamic' state

Tzu-Li (Gordon) Tai
Hi Steve,

I’ll try to provide some input for the approaches you’re currently looking into (I’ll comment on your email below):

* API based stop and restart of job … ugly. 

Yes, indeed ;) I think this is absolutely not necessary.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and 

Do you mean that operators listen only to certain rules / criteria settings changes? You could either have separate stream sources for each kind of criteria rule trigger events, or have a single source and split them afterwards. Then, you broadcast each of them with the corresponding co-map / flat-maps.

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available? 

From the context of your description, I think what you want is that the injected rules stream can be seen by all operators (instead of “broadcast state”, which in Flink streaming refers to something else).

Aljoscha recently consolidated a FLIP for Side Inputs [1], which I think is targeted exactly for what you have in mind here. Perhaps you can take a look at that and see if it makes sense for your use case? But of course, this isn’t yet available as it is still under discussion. I think Side Inputs may be an ideal solution for what you have in mind here, as the rule triggers I assume should be slowly changing.

I’ve CCed Aljoscha so that he can probably provide more insights, as he has worked a lot on the stuff mentioned here.

Cheers,
Gordon

[1] FLIP-17: https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API

On March 7, 2017 at 5:05:04 AM, Steve Jerman ([hidden email]) wrote:

I’ve been reading the code/user goup/SO and haven’t really found a great answer to this… so I thought I’d ask.

I have a UI that allows the user to edit rules which include specific criteria for example trigger event if X many people present for over a minute.

I would like to have a flink job that processes an event stream and triggers on these rules.

The catch is that I don’t want to have to restart the job if the rules change… (it would be easy otherwise :))

So I found four ways to proceed:

* API based stop and restart of job … ugly.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available?

* Implement my own operators… seems complicated ;)

Are there other approaches?

Thanks for any advice
Steve
Reply | Threaded
Open this post in threaded view
|

Re: How to use 'dynamic' state

Aljoscha Krettek
Hi Steve,
I think Gordon already gave a pretty good answer, I'll just try and go into the specifics a bit.

You mentioned that there would be different kinds of operators required for the rules (you hinted at FlatMap and maybe a tumbling window). Do you know each of those types before starting your program? If yes, you could have several of these "primitive" operations in your pipeline and each of them only listens to rule changes (on a second input) that is relevant to their operation.

Side inputs would be very good for this but I think you can also get the same level of functionality by using a CoFlatMap (for the window case you would use a CoFlatMap chained to a window operation).

Does that help? I'm sure we can figure something out together.

Best,
Aljoscha


On Tue, Mar 7, 2017, at 07:44, Tzu-Li (Gordon) Tai wrote:
Hi Steve,

I’ll try to provide some input for the approaches you’re currently looking into (I’ll comment on your email below):

* API based stop and restart of job … ugly. 

Yes, indeed ;) I think this is absolutely not necessary.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and 

Do you mean that operators listen only to certain rules / criteria settings changes? You could either have separate stream sources for each kind of criteria rule trigger events, or have a single source and split them afterwards. Then, you broadcast each of them with the corresponding co-map / flat-maps.

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available? 

From the context of your description, I think what you want is that the injected rules stream can be seen by all operators (instead of “broadcast state”, which in Flink streaming refers to something else).

Aljoscha recently consolidated a FLIP for Side Inputs [1], which I think is targeted exactly for what you have in mind here. Perhaps you can take a look at that and see if it makes sense for your use case? But of course, this isn’t yet available as it is still under discussion. I think Side Inputs may be an ideal solution for what you have in mind here, as the rule triggers I assume should be slowly changing.

I’ve CCed Aljoscha so that he can probably provide more insights, as he has worked a lot on the stuff mentioned here.

Cheers,
Gordon




On March 7, 2017 at 5:05:04 AM, Steve Jerman ([hidden email]) wrote:



I’ve been reading the code/user goup/SO and haven’t really found a great answer to this… so I thought I’d ask.

I have a UI that allows the user to edit rules which include specific criteria for example trigger event if X many people present for over a minute.

I would like to have a flink job that processes an event stream and triggers on these rules.

The catch is that I don’t want to have to restart the job if the rules change… (it would be easy otherwise :))

So I found four ways to proceed:

* API based stop and restart of job … ugly.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available?

* Implement my own operators… seems complicated ;)

Are there other approaches?

Thanks for any advice
Steve


Reply | Threaded
Open this post in threaded view
|

Re: How to use 'dynamic' state

Steve Jerman
Thanks for your reply. It makes things much clearer for me.  I think you are right - Side Inputs are probably the right way long term (I  looked at the Team definition), but I think I can construct something in the mean time.

Steve

On Mar 7, 2017, at 6:11 AM, Aljoscha Krettek <[hidden email]> wrote:

Hi Steve,
I think Gordon already gave a pretty good answer, I'll just try and go into the specifics a bit.

You mentioned that there would be different kinds of operators required for the rules (you hinted at FlatMap and maybe a tumbling window). Do you know each of those types before starting your program? If yes, you could have several of these "primitive" operations in your pipeline and each of them only listens to rule changes (on a second input) that is relevant to their operation.

Side inputs would be very good for this but I think you can also get the same level of functionality by using a CoFlatMap (for the window case you would use a CoFlatMap chained to a window operation).

Does that help? I'm sure we can figure something out together.

Best,
Aljoscha


On Tue, Mar 7, 2017, at 07:44, Tzu-Li (Gordon) Tai wrote:
Hi Steve,

I’ll try to provide some input for the approaches you’re currently looking into (I’ll comment on your email below):

* API based stop and restart of job … ugly. 

Yes, indeed ;) I think this is absolutely not necessary.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and 

Do you mean that operators listen only to certain rules / criteria settings changes? You could either have separate stream sources for each kind of criteria rule trigger events, or have a single source and split them afterwards. Then, you broadcast each of them with the corresponding co-map / flat-maps.

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available? 

From the context of your description, I think what you want is that the injected rules stream can be seen by all operators (instead of “broadcast state”, which in Flink streaming refers to something else).

Aljoscha recently consolidated a FLIP for Side Inputs [1], which I think is targeted exactly for what you have in mind here. Perhaps you can take a look at that and see if it makes sense for your use case? But of course, this isn’t yet available as it is still under discussion. I think Side Inputs may be an ideal solution for what you have in mind here, as the rule triggers I assume should be slowly changing.

I’ve CCed Aljoscha so that he can probably provide more insights, as he has worked a lot on the stuff mentioned here.

Cheers,
Gordon

[1] FLIP-17: <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-17&#43;Side&#43;Inputs&#43;for&#43;DataStream&#43;API" style="font-family:'helvetica Neue', helvetica;font-size:14.166666030883789px;line-height:19.600000381469727px;" class="">https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API



On March 7, 2017 at 5:05:04 AM, Steve Jerman ([hidden email]) wrote:



I’ve been reading the code/user goup/SO and haven’t really found a great answer to this… so I thought I’d ask.

I have a UI that allows the user to edit rules which include specific criteria for example trigger event if X many people present for over a minute.

I would like to have a flink job that processes an event stream and triggers on these rules.

The catch is that I don’t want to have to restart the job if the rules change… (it would be easy otherwise :))

So I found four ways to proceed:

* API based stop and restart of job … ugly.

* Use a co-map function with the rules alone stream and the events as the other. This seems better however, I would like to have several ‘trigger’ functions changed together .. e.g. a tumbling window for one type of criteria and a flat map for a different sort … So I’m not sure how to hook this up for more than a simple co-map/flatmap. I did see this suggested in one answer and

* Use broadcast state : this seems reasonable however I couldn’t tell if the broadcast state would be available to all of the processing functions. Is it generally available?

* Implement my own operators… seems complicated ;)

Are there other approaches?

Thanks for any advice
Steve