Async Source Function in Flink

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Async Source Function in Flink

Federico D'Ambrosio-2
Hello everyone,

just wanted to ask a quick question: I have to retrieve data from 2 web services via REST calls, use them as sources and push these data to Kafka. So far, I implemented a SourceFunction which deals with making the calls with the respective clients.

Now, the function does use, for each REST call, Await.result(....). Do I need to use Flink's AsyncFunction instead? What are the best practices when it comes to AsyncSources?

Thank you,
--
Federico D'Ambrosio
Reply | Threaded
Open this post in threaded view
|

Re: Async Source Function in Flink

Timo Walther
Hi Frederico,

Flink's AsyncFunction is meant for enriching a record with information that needs to be queried externally. So I guess you can't use it for your use case because an async call is initiated by the input. However, your custom SourceFunction could implement a similar asynchronous logic. By having a pool of open connections that request asynchronously and emit the response to the stream, once available, you can improve your throughput (see [0]).

Depending on your use case maybe the SourceFunction can only be responsible for determining e.g. ids and the AsyncFunction is requesting these ids via REST. This way you could leverage the available async capabilities.

I hope this helps.

Regards,
Timo

[0] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/asyncio.html#the-need-for-asynchronous-io-operations


Am 14.05.18 um 14:51 schrieb Federico D'Ambrosio:
Hello everyone,

just wanted to ask a quick question: I have to retrieve data from 2 web services via REST calls, use them as sources and push these data to Kafka. So far, I implemented a SourceFunction which deals with making the calls with the respective clients.

Now, the function does use, for each REST call, Await.result(....). Do I need to use Flink's AsyncFunction instead? What are the best practices when it comes to AsyncSources?

Thank you,
--
Federico D'Ambrosio


Reply | Threaded
Open this post in threaded view
|

Re: Async Source Function in Flink

Federico D'Ambrosio-2
I see, thank you very much for your answer! I'll look into pool connection handling.

Alternatively, I suppose that since it is a SourceFunction, even synchronous calls may be used without side effects in Flink?

Thank you,
Federico

Il giorno mar 15 mag 2018 alle ore 16:16 Timo Walther <[hidden email]> ha scritto:
Hi Frederico,

Flink's AsyncFunction is meant for enriching a record with information that needs to be queried externally. So I guess you can't use it for your use case because an async call is initiated by the input. However, your custom SourceFunction could implement a similar asynchronous logic. By having a pool of open connections that request asynchronously and emit the response to the stream, once available, you can improve your throughput (see [0]).

Depending on your use case maybe the SourceFunction can only be responsible for determining e.g. ids and the AsyncFunction is requesting these ids via REST. This way you could leverage the available async capabilities.

I hope this helps.

Regards,
Timo

[0] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/asyncio.html#the-need-for-asynchronous-io-operations


Am 14.05.18 um 14:51 schrieb Federico D'Ambrosio:
Hello everyone,

just wanted to ask a quick question: I have to retrieve data from 2 web services via REST calls, use them as sources and push these data to Kafka. So far, I implemented a SourceFunction which deals with making the calls with the respective clients.

Now, the function does use, for each REST call, Await.result(....). Do I need to use Flink's AsyncFunction instead? What are the best practices when it comes to AsyncSources?

Thank you,
--
Federico D'Ambrosio




--
Federico D'Ambrosio