AsyncDataStream on key of KeyedStream

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

AsyncDataStream on key of KeyedStream

Flavio Pompermaier
Hi to all,
I'm trying to exploit async IO in my Flink job.
In my use case I use keyed tumbling windows and I'd like to execute the async action only once per key and window (while the AsyncDataStream.unorderedWait execute the async call for every element of my stream) ..is there an easy way to do that apart from using a process function (that basically will lose the asynchronicity)?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: AsyncDataStream on key of KeyedStream

Fabian Hueske-2
Hi Flavio,

Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator?

Best, Fabian

Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <[hidden email]>:
Hi to all,
I'm trying to exploit async IO in my Flink job.
In my use case I use keyed tumbling windows and I'd like to execute the async action only once per key and window (while the AsyncDataStream.unorderedWait execute the async call for every element of my stream) ..is there an easy way to do that apart from using a process function (that basically will lose the asynchronicity)?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: AsyncDataStream on key of KeyedStream

Flavio Pompermaier
The problem of bundling all records together within a window is that this solution doesn't scale (in the case of large time windows and number of events)..my requirement could be fulfilled by a keyed ProcessFunction but I think AsyncDataStream should provide a first-class support to keyed streams (and thus perform a single call per key and window..). What do you think?

On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator?

Best, Fabian

Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <[hidden email]>:
Hi to all,
I'm trying to exploit async IO in my Flink job.
In my use case I use keyed tumbling windows and I'd like to execute the async action only once per key and window (while the AsyncDataStream.unorderedWait execute the async call for every element of my stream) ..is there an easy way to do that apart from using a process function (that basically will lose the asynchronicity)?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: AsyncDataStream on key of KeyedStream

Fabian Hueske-2
OK, I see. What information will be send out via the async request?
Maybe you can fork of a separate stream with the info that needs to be send to the external service and later union the result with the main stream before the window operator?



Am Di., 23. Juli 2019 um 14:12 Uhr schrieb Flavio Pompermaier <[hidden email]>:
The problem of bundling all records together within a window is that this solution doesn't scale (in the case of large time windows and number of events)..my requirement could be fulfilled by a keyed ProcessFunction but I think AsyncDataStream should provide a first-class support to keyed streams (and thus perform a single call per key and window..). What do you think?

On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator?

Best, Fabian

Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <[hidden email]>:
Hi to all,
I'm trying to exploit async IO in my Flink job.
In my use case I use keyed tumbling windows and I'd like to execute the async action only once per key and window (while the AsyncDataStream.unorderedWait execute the async call for every element of my stream) ..is there an easy way to do that apart from using a process function (that basically will lose the asynchronicity)?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: AsyncDataStream on key of KeyedStream

Flavio Pompermaier
For each key I need to call an external REST service to get the current status and this is why I'd like to use Async IO. At the moment I do this in a process function but I'd like a cleaner solution (if possible).
Do you think your proposal of forking could be a better option?
Could you provide a simple snippet/peudo-code of it? I'm not sure I've fully undestand your suggestion..
Reply | Threaded
Open this post in threaded view
|

Re: AsyncDataStream on key of KeyedStream

Fabian Hueske-2
Sure:

                                                   /--> AsyncIO --\
STREAM --> ProcessFunc  --                          -- Union -- WindowFunc
                                                  \------------------/

ProcessFunc keeps track of the unique keys per window duration and emits each distinct key just once to the AsyncIO function via a side output. Through the main output it sends all values it receives.
AsyncIO queries the external store for each key it receives.
Union just unions both streams (possibly using an Either type).
WindowFunction compute the window and includes the information that was fetched by the AsyncIO function.

Cheers,
Fabian

Am Di., 23. Juli 2019 um 17:25 Uhr schrieb Flavio Pompermaier <[hidden email]>:
For each key I need to call an external REST service to get the current status and this is why I'd like to use Async IO. At the moment I do this in a process function but I'd like a cleaner solution (if possible).
Do you think your proposal of forking could be a better option?
Could you provide a simple snippet/peudo-code of it? I'm not sure I've fully undestand your suggestion..