Enriching events with data from external http resources

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Enriching events with data from external http resources

Maciek Próchniak
Hi,

Our data streams do some filtering based on data from external http
resources (not maintained by us, they're really fast with redis as storage).

So far we did that by just invoking synchronously some http client in
map/flatMap operations. It works without errors but it seems somehow
inefficient to have to use degree of paralellism just to wait for
blocking http call - especially when you think about all recent
developments of fast async clients and so on. I was wondering if there
is some way of invoking http (or other external) service in non-blocking
way.

I know it's not really desired way of using flink and that it would be
better to keep data as state inside stream and have it updated by some
join operator, but for us it's a bit of overkill - what's more, we have
many (not so large) streams, it would be not really feasible to keep all
of state (which is the same) in each of them.

Are there any patterns/ways - existing or planned of dealing with such
situation?

thanks,

maciek

Reply | Threaded
Open this post in threaded view
|

Re: Enriching events with data from external http resources

Ufuk Celebi
On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak <[hidden email]> wrote:
> I know it's not really desired way of using flink and that it would be
> better to keep data as state inside stream and have it updated by some join
> operator, but for us it's a bit of overkill - what's more, we have many (not
> so large) streams, it would be not really feasible to keep all of state
> (which is the same) in each of them.

Hey Maciek! The points you raise all make sense and there is work in
progress to provide better support for these use cases:
https://issues.apache.org/jira/browse/FLINK-4391

The general idea would be to have something like a multi threaded flat
map function that dispatches the requests to a thread pool (it's like
"virtually" increasing the parallelism as you do now). This is pretty
straight forward to implement if you don't need to worry about fault
tolerance for now. Integrating this with checkpointing is a little
more involved and will be addressed as part of the linked issue.
Reply | Threaded
Open this post in threaded view
|

Re: Enriching events with data from external http resources

Maciek Próchniak
Hi Ufuk,

thanks for info - this is good news :)

maciek


On 16/08/2016 12:16, Ufuk Celebi wrote:

> On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak <[hidden email]> wrote:
>> I know it's not really desired way of using flink and that it would be
>> better to keep data as state inside stream and have it updated by some join
>> operator, but for us it's a bit of overkill - what's more, we have many (not
>> so large) streams, it would be not really feasible to keep all of state
>> (which is the same) in each of them.
> Hey Maciek! The points you raise all make sense and there is work in
> progress to provide better support for these use cases:
> https://issues.apache.org/jira/browse/FLINK-4391
>
> The general idea would be to have something like a multi threaded flat
> map function that dispatches the requests to a thread pool (it's like
> "virtually" increasing the parallelism as you do now). This is pretty
> straight forward to implement if you don't need to worry about fault
> tolerance for now. Integrating this with checkpointing is a little
> more involved and will be addressed as part of the linked issue.
>