Calling close() on Failure

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Calling close() on Failure

Gregory Fee
Hello! I had a program lose a task manager the other day. The fail over back to a checkpoint and recovery worked like a charm. However, on one of my ProcessFunctions I defined a close() method and I noticed that it did not get called. To be clear, the task manager that failed was running that ProcessFunction. It makes sense to me that close() might not be callable in that case. But I had parallelism at 24 and I know that other instances of that ProcessFunction were running on machines that were gracefully shutdown yet zero close() functions were invoked. It seems like close() should get called on operators that are shutdown gracefully even in a failure condition. Is that how Flink is supposed to work? Am I missing something?

--
<form method="post" target="_blank" onsubmit="try {return window.confirm(&quot;You are submitting information to an external page.\nAre you sure?&quot;);} catch (e) {return false;}">
Gregory Fee
Engineer
<a href="tel:+14258304734" style="font-size:13px;color:#494f50;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-serif;text-decoration:none" target="_blank">425.830.4734
Lyft
Reply | Threaded
Open this post in threaded view
|

Re: Calling close() on Failure

Nico Kruber
Hi Gregory,
I tried to reproduce the behaviour you described but in my case (Flink
1.5-SNAPSHOT, using the SocketWindowWordCount adapted to let the first
flatmap be a RichFlatMapFunction with a close() method), the close()
method was actually called on the task manager I did not kill. Since the
close() actually comes from the RichFunction, the handling compared to a
ProcessFunction should not be different.

Can you give more details on your program and why you think it was not
called?


Regards
Nico

On 15/03/18 21:16, Gregory Fee wrote:

> Hello! I had a program lose a task manager the other day. The fail over
> back to a checkpoint and recovery worked like a charm. However, on one
> of my ProcessFunctions I defined a close() method and I noticed that it
> did not get called. To be clear, the task manager that failed was
> running that ProcessFunction. It makes sense to me that close() might
> not be callable in that case. But I had parallelism at 24 and I know
> that other instances of that ProcessFunction were running on machines
> that were gracefully shutdown yet zero close() functions were invoked.
> It seems like close() should get called on operators that are shutdown
> gracefully even in a failure condition. Is that how Flink is supposed to
> work? Am I missing something?
>
> --
> *Gregory Fee*
>
> Engineer
> 425.830.4734 <tel:+14258304734>
> Lyft <http://www.lyft.com>


signature.asc (201 bytes) Download Attachment