Hi,
Our data streams do some filtering based on data from external http resources (not maintained by us, they're really fast with redis as storage). So far we did that by just invoking synchronously some http client in map/flatMap operations. It works without errors but it seems somehow inefficient to have to use degree of paralellism just to wait for blocking http call - especially when you think about all recent developments of fast async clients and so on. I was wondering if there is some way of invoking http (or other external) service in non-blocking way. I know it's not really desired way of using flink and that it would be better to keep data as state inside stream and have it updated by some join operator, but for us it's a bit of overkill - what's more, we have many (not so large) streams, it would be not really feasible to keep all of state (which is the same) in each of them. Are there any patterns/ways - existing or planned of dealing with such situation? thanks, maciek |
On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak <[hidden email]> wrote:
> I know it's not really desired way of using flink and that it would be > better to keep data as state inside stream and have it updated by some join > operator, but for us it's a bit of overkill - what's more, we have many (not > so large) streams, it would be not really feasible to keep all of state > (which is the same) in each of them. Hey Maciek! The points you raise all make sense and there is work in progress to provide better support for these use cases: https://issues.apache.org/jira/browse/FLINK-4391 The general idea would be to have something like a multi threaded flat map function that dispatches the requests to a thread pool (it's like "virtually" increasing the parallelism as you do now). This is pretty straight forward to implement if you don't need to worry about fault tolerance for now. Integrating this with checkpointing is a little more involved and will be addressed as part of the linked issue. |
Hi Ufuk,
thanks for info - this is good news :) maciek On 16/08/2016 12:16, Ufuk Celebi wrote: > On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak <[hidden email]> wrote: >> I know it's not really desired way of using flink and that it would be >> better to keep data as state inside stream and have it updated by some join >> operator, but for us it's a bit of overkill - what's more, we have many (not >> so large) streams, it would be not really feasible to keep all of state >> (which is the same) in each of them. > Hey Maciek! The points you raise all make sense and there is work in > progress to provide better support for these use cases: > https://issues.apache.org/jira/browse/FLINK-4391 > > The general idea would be to have something like a multi threaded flat > map function that dispatches the requests to a thread pool (it's like > "virtually" increasing the parallelism as you do now). This is pretty > straight forward to implement if you don't need to worry about fault > tolerance for now. Integrating this with checkpointing is a little > more involved and will be addressed as part of the linked issue. > |
Free forum by Nabble | Edit this page |