Is it possible to restart only the function that fails instead of entire job?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to restart only the function that fails instead of entire job?

Chia-Hung Lin
After reading the document and configuring to test failure strategy,
it seems to me Flink restarts the job once any failures (e.g.
exception thrown, etc.) occur.

https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

My question:

Is it possible to configure in allowing the function that fails to
recover instead of restarting entire job (like Erlang's One For One
Supervision)? For instance within a job the parallelism is configured
to 100, so at runtime 100 maps instances are executed. Now one of map
functions fails, we want to recover the failed map function because
other map functions are functioning normally. Is it possible to
achieve such effect?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to restart only the function that fails instead of entire job?

Ufuk Celebi
Unfortunately, this is not possible at the moment. This optimization
definitely makes sense in certain situations. How large is your state
and how long does it take to recover?

On Fri, Jul 1, 2016 at 9:18 AM, Chia-Hung Lin <[hidden email]> wrote:

> After reading the document and configuring to test failure strategy,
> it seems to me Flink restarts the job once any failures (e.g.
> exception thrown, etc.) occur.
>
> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
>
> My question:
>
> Is it possible to configure in allowing the function that fails to
> recover instead of restarting entire job (like Erlang's One For One
> Supervision)? For instance within a job the parallelism is configured
> to 100, so at runtime 100 maps instances are executed. Now one of map
> functions fails, we want to recover the failed map function because
> other map functions are functioning normally. Is it possible to
> achieve such effect?
>
> Thanks