(DEPRECATED) Apache Flink User Mailing List archive.

Flink batch processing fault tolerance

Classic

List

Threaded

7 messages Options

Renjie Liu

Flink batch processing fault tolerance

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?

Liu, Renjie

Software Engineer, MVAD

Aljoscha Krettek

Re: Flink batch processing fault tolerance

Hi,

yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,

Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?
--
Liu, Renjie
Software Engineer, MVAD

Anton Solovev

RE: Flink batch processing fault tolerance

Hi Aljoscha,

Could you share your plans of resolving it?

Best,

Anton

From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 2:48 PM
To: [hidden email]
Subject: Re: Flink batch processing fault tolerance

Hi,

Best,

Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?

--

Liu, Renjie

Software Engineer, MVAD

Renjie Liu

Re: Flink batch processing fault tolerance

https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

This FLIP may help.

On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <[hidden email]> wrote:

Hi Aljoscha,

Could you share your plans of resolving it?

Best,

Anton

From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 2:48 PM
To: [hidden email]
Subject: Re: Flink batch processing fault tolerance

Hi,

yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,

Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?

--

Liu, Renjie

Software Engineer, MVAD

Liu, Renjie

Software Engineer, MVAD

Si-li Liu

Re: Flink batch processing fault tolerance

Hi,

It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again.

2017-02-17 10:56 GMT+08:00 Renjie Liu <[hidden email]>:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
This FLIP may help.

On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <[hidden email]> wrote:

Hi Aljoscha,

Could you share your plans of resolving it?

Best,

Anton

From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 2:48 PM
To: [hidden email]
Subject: Re: Flink batch processing fault tolerance

Hi,

yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,

Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?

--

Liu, Renjie

Software Engineer, MVAD

--
Liu, Renjie
Software Engineer, MVAD

Best regards

Sili Liu

Zhijiang(wangzhijiang999)

回复：Flink batch processing fault tolerance

In reply to this post by Renjie Liu

yes, it is really a critical problem for large batch job because the unexpected failure is a common case.

And we are already focusing on realizing the ideas mentioned in FLIP1, wish to contirbute to flink in months.

Best,

Zhijiang

------------------------------------------------------------------
发件人：Si-li Liu <[hidden email]>
发送时间：2017年2月17日(星期五) 11:22
收件人：user <[hidden email]>
主　题：Re: Flink batch processing fault tolerance

Hi,

It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again.

2017-02-17 10:56 GMT+08:00 Renjie Liu <[hidden email]>:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
This FLIP may help.

On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <[hidden email]> wrote:
Hi Aljoscha,
Could you share your plans of resolving it?

Best,
Anton

From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 2:48 PM
To: [hidden email]
Subject: Re: Flink batch processing fault tolerance

Hi,
yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,
Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:
Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?
--
Liu, Renjie
Software Engineer, MVAD
--
Liu, Renjie
Software Engineer, MVAD

--
Best regards

Sili Liu

Aljoscha Krettek

Re: Flink batch processing fault tolerance

@Anton, these are the Ideas I was mentioning and I'm afraid I have nothing more to add. (In the FLIP)

On Fri, 17 Feb 2017 at 06:26 wangzhijiang999 <[hidden email]> wrote:

yes, it is really a critical problem for large batch job because the unexpected failure is a common case.
And we are already focusing on realizing the ideas mentioned in FLIP1, wish to contirbute to flink in months.

Best,

Zhijiang
------------------------------------------------------------------
发件人：Si-li Liu <[hidden email]>
发送时间：2017年2月17日(星期五) 11:22
收件人：user <[hidden email]>
主　题：Re: Flink batch processing fault tolerance

Hi,

It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again.
2017-02-17 10:56 GMT+08:00 Renjie Liu <[hidden email]>:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
This FLIP may help.

On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <[hidden email]> wrote:
Hi Aljoscha,
Could you share your plans of resolving it?

Best,
Anton

From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 2:48 PM
To: [hidden email]
Subject: Re: Flink batch processing fault tolerance

Hi,
yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,
Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <[hidden email]> wrote:
Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?
--
Liu, Renjie
Software Engineer, MVAD
--
Liu, Renjie
Software Engineer, MVAD
--
Best regards

Sili Liu