Unfortunately, this is not possible at the moment. This optimization
definitely makes sense in certain situations. How large is your state
and how long does it take to recover?
On Fri, Jul 1, 2016 at 9:18 AM, Chia-Hung Lin <
[hidden email]> wrote:
> After reading the document and configuring to test failure strategy,
> it seems to me Flink restarts the job once any failures (e.g.
> exception thrown, etc.) occur.
>
>
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html>
> My question:
>
> Is it possible to configure in allowing the function that fails to
> recover instead of restarting entire job (like Erlang's One For One
> Supervision)? For instance within a job the parallelism is configured
> to 100, so at runtime 100 maps instances are executed. Now one of map
> functions fails, we want to recover the failed map function because
> other map functions are functioning normally. Is it possible to
> achieve such effect?
>
> Thanks