Async IO Question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Async IO Question

Frank Xue
Hi,

I have a question related to async io for Flink. I found that when running unordered (AsyncDataStream.unorderedWait) failures within each individual asyncInvoke is added back to be retried, but when I run it ordered (AsyncDataStream.orderedWait) and an exception is thrown within asyncInvoke, it just stops the whole process. Is this expected behavior?

Thanks,
Frank

--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
Reply | Threaded
Open this post in threaded view
|

Re: Async IO Question

Till Rohrmann

Hi Frank,

which version of Flink are you using? There was a problem with correctly recognizing failed asynchronous operations, see FLINK-6435 [1].

In general, if an exception occurs within AsyncFunction#asyncInvoke, then the job should fail. Depending on which restart strategy you have chosen, the job is retried or not. What Flink should not do is to retry for the unordered case and not retry for the ordered case.

Maybe you could share the exact code you’re running in order to see whether you meant exception occurring within AsyncFunction#asyncInvoke or within a future which completes AsyncCollector.

[1] https://issues.apache.org/jira/browse/FLINK-6435

Cheers,
Till


On Mon, May 22, 2017 at 9:36 PM, Frank Xue <[hidden email]> wrote:
Hi,

I have a question related to async io for Flink. I found that when running unordered (AsyncDataStream.unorderedWait) failures within each individual asyncInvoke is added back to be retried, but when I run it ordered (AsyncDataStream.orderedWait) and an exception is thrown within asyncInvoke, it just stops the whole process. Is this expected behavior?

Thanks,
Frank

--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305

Reply | Threaded
Open this post in threaded view
|

Re: Async IO Question

Frank Xue
Thanks for the reply Till! I am using Flink 1.2.1 so it could be an issue with the bug you mentioned that looks to be fixed in 1.3. The restart strategy is a fixed delay restart and I have tried various checkpoint and restart intervals and the behavior remains the same. Pretty much inside the asyncInvoke it does a query call on ElasticSearch and puts it in a future. On complete, if the future is a failure it throws the exception. I speculate it does not retry properly on only ordered because when one instance fails, other events are waiting on that one since the output has to remain in order which eventually clogs up the capacity so when the job for the failed event is restarted there is no room for it to be run again. I should mention that I am running load testing when this happens (tens or hundreds of thousands of events coming through at about the same time). Does this help shed more light on the behavior I am seeing?

Thanks,
Frank

On Tue, May 23, 2017 at 10:07 AM, Till Rohrmann <[hidden email]> wrote:

Hi Frank,

which version of Flink are you using? There was a problem with correctly recognizing failed asynchronous operations, see FLINK-6435 [1].

In general, if an exception occurs within AsyncFunction#asyncInvoke, then the job should fail. Depending on which restart strategy you have chosen, the job is retried or not. What Flink should not do is to retry for the unordered case and not retry for the ordered case.

Maybe you could share the exact code you’re running in order to see whether you meant exception occurring within AsyncFunction#asyncInvoke or within a future which completes AsyncCollector.

[1] https://issues.apache.org/jira/browse/FLINK-6435

Cheers,
Till


On Mon, May 22, 2017 at 9:36 PM, Frank Xue <[hidden email]> wrote:
Hi,

I have a question related to async io for Flink. I found that when running unordered (AsyncDataStream.unorderedWait) failures within each individual asyncInvoke is added back to be retried, but when I run it ordered (AsyncDataStream.orderedWait) and an exception is thrown within asyncInvoke, it just stops the whole process. Is this expected behavior?

Thanks,
Frank

--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305




--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
Reply | Threaded
Open this post in threaded view
|

Re: Async IO Question

Till Rohrmann
Hi Frank,

yes it could be related to the bug we have fixed in 1.3. Could you try it out with Flink 1.3 to see if it fixes your problem? If not, then I would like to take a look at your code to exactly see what is happening there.

Cheers,
Till

On Tue, May 23, 2017 at 4:28 PM, Frank Xue <[hidden email]> wrote:
Thanks for the reply Till! I am using Flink 1.2.1 so it could be an issue with the bug you mentioned that looks to be fixed in 1.3. The restart strategy is a fixed delay restart and I have tried various checkpoint and restart intervals and the behavior remains the same. Pretty much inside the asyncInvoke it does a query call on ElasticSearch and puts it in a future. On complete, if the future is a failure it throws the exception. I speculate it does not retry properly on only ordered because when one instance fails, other events are waiting on that one since the output has to remain in order which eventually clogs up the capacity so when the job for the failed event is restarted there is no room for it to be run again. I should mention that I am running load testing when this happens (tens or hundreds of thousands of events coming through at about the same time). Does this help shed more light on the behavior I am seeing?

Thanks,
Frank

On Tue, May 23, 2017 at 10:07 AM, Till Rohrmann <[hidden email]> wrote:

Hi Frank,

which version of Flink are you using? There was a problem with correctly recognizing failed asynchronous operations, see FLINK-6435 [1].

In general, if an exception occurs within AsyncFunction#asyncInvoke, then the job should fail. Depending on which restart strategy you have chosen, the job is retried or not. What Flink should not do is to retry for the unordered case and not retry for the ordered case.

Maybe you could share the exact code you’re running in order to see whether you meant exception occurring within AsyncFunction#asyncInvoke or within a future which completes AsyncCollector.

[1] https://issues.apache.org/jira/browse/FLINK-6435

Cheers,
Till


On Mon, May 22, 2017 at 9:36 PM, Frank Xue <[hidden email]> wrote:
Hi,

I have a question related to async io for Flink. I found that when running unordered (AsyncDataStream.unorderedWait) failures within each individual asyncInvoke is added back to be retried, but when I run it ordered (AsyncDataStream.orderedWait) and an exception is thrown within asyncInvoke, it just stops the whole process. Is this expected behavior?

Thanks,
Frank

--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305




--
Frank Xue | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305