|
If a TaskManager fails, the data stored on it will be lost and needs to be recomputed. So even with the batch mode configured, more tasks might need a restart. To mitigate that, the Flink developers need to implement support for external shuffle services. On Wed, Dec 16, 2020 at 9:10 AM Robert Metzger < [hidden email]> wrote: With region failover strategy, all connected subtasks will fail.
If you are using the DataSet API with env.getConfig().setExecutionMode(ExecutionMode.BATCH);, you should get the desired behavior.
On Mon, Dec 14, 2020 at 5:24 PM Stanislav Borissov < [hidden email]> wrote: Hi,
I'm running a simple, "embarassingly parallel" ETL-type job. I noticed that a failure in one subtask causes the entire job to restart. Even with the region failover strategy, all subtasks of this task and connected ones would fail. Is there any way to limit restarting to only the single subtask that failed, so all other subtasks can stay alive and keep working?
Thanks
|