(DEPRECATED) Apache Flink User Mailing List archive.

Current alternatives for async I/O

Classic

List

Threaded

3 messages Options

Ken Krugler

Current alternatives for async I/O

Hi all,

I’ve been watching the FLIP-12 design discussion, and it looks like a promising solution for the issues we’ve got with needing to make asynchronous multi-threaded requests in a Flink operator.

What’s the best workaround with current releases of Flink?

One option is to have a special tickler source that broadcasts a Tuple0 every X milliseconds, which gets connected to the real stream that feeds a CoFlatMap. Inside of this I’ve got queues for incoming and generated tuples, with a thread pool to pull from the incoming and write to the generated queues. When I get one of the “tickle” Tuple0s, I emit all of the generated tuples.

There are issues with needing to bound the size of the queues, and all of the usual fun with thread pools, but it seems to work.

Is there a better/simpler approach?

Thanks,

— Ken

--------------------------

Ken Krugler

+1 530-210-6378

http://www.scaleunlimited.com

custom big data solutions & training

Hadoop, Cascading, Cassandra & Solr

Fabian Hueske-2

Re: Current alternatives for async I/O

Hi Ken,

I think your solution should work.
You need to make sure though, that you properly manage the state of your function, i.e., memorize all records which have been received but haven't be emitted yet.

Otherwise records might get lost in case of a failure.

Alternatively, you can implement this as a custom operators. This would give you full access but you would need to take care of organizing checkpoints and other low-level issues yourself. This would also be basically the same as implementing FLIP-12 (or a subset of it).

Best, Fabian

2016-10-09 3:31 GMT+02:00 Ken Krugler <[hidden email]>:

Hi all,

I’ve been watching the FLIP-12 design discussion, and it looks like a promising solution for the issues we’ve got with needing to make asynchronous multi-threaded requests in a Flink operator.

What’s the best workaround with current releases of Flink?

One option is to have a special tickler source that broadcasts a Tuple0 every X milliseconds, which gets connected to the real stream that feeds a CoFlatMap. Inside of this I’ve got queues for incoming and generated tuples, with a thread pool to pull from the incoming and write to the generated queues. When I get one of the “tickle” Tuple0s, I emit all of the generated tuples.

There are issues with needing to bound the size of the queues, and all of the usual fun with thread pools, but it seems to work.

Is there a better/simpler approach?

Thanks,

— Ken

--------------------------
Ken Krugler
<a href="tel:%2B1%20530-210-6378" value="+15302106378" target="_blank">+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Fabian Hueske-2

Re: Current alternatives for async I/O

Hi Ken,

FYI: we just received a pull request for FLIP-12 [1].

Best, Fabian

[1] https://github.com/apache/flink/pull/2629

2016-10-11 9:35 GMT+02:00 Fabian Hueske <[hidden email]>:

Hi Ken,

I think your solution should work.
You need to make sure though, that you properly manage the state of your function, i.e., memorize all records which have been received but haven't be emitted yet.
Otherwise records might get lost in case of a failure.

Alternatively, you can implement this as a custom operators. This would give you full access but you would need to take care of organizing checkpoints and other low-level issues yourself. This would also be basically the same as implementing FLIP-12 (or a subset of it).

Best, Fabian

2016-10-09 3:31 GMT+02:00 Ken Krugler <[hidden email]>:
Hi all,

I’ve been watching the FLIP-12 design discussion, and it looks like a promising solution for the issues we’ve got with needing to make asynchronous multi-threaded requests in a Flink operator.

What’s the best workaround with current releases of Flink?

One option is to have a special tickler source that broadcasts a Tuple0 every X milliseconds, which gets connected to the real stream that feeds a CoFlatMap. Inside of this I’ve got queues for incoming and generated tuples, with a thread pool to pull from the incoming and write to the generated queues. When I get one of the “tickle” Tuple0s, I emit all of the generated tuples.

There are issues with needing to bound the size of the queues, and all of the usual fun with thread pools, but it seems to work.

Is there a better/simpler approach?

Thanks,

— Ken

--------------------------
Ken Krugler
<a href="tel:%2B1%20530-210-6378" value="+15302106378" target="_blank">+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr