How to setup Regions for Fault Tolerance in Flink when using Side Outputs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to setup Regions for Fault Tolerance in Flink when using Side Outputs

Eifler, Patrick

Hi all,

 

We are trying to setup regions to enable Flink to only stop failing tasks based on region instead of failing the entire stream.

We are using one main stream that is reading from a kafka topic and a bunch of side outputs for processing each event from that topic differently.

For the processing in the side outputs we use the process function provided by flink.

 

So far when one side output stream failed, the whole stream job failed.

 

Is there anything that needs to be done or set on the Side Outputs so that Flink recognizes them as regions?

Is it even possible to have Flink handle side outputs as regions and restart only one specific side output stream on failure?

 

Many thanks in advance!

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to setup Regions for Fault Tolerance in Flink when using Side Outputs

Till Rohrmann
Hi Patrick,

Flink supports regional failover [1] which only restarts all tasks connected via pipelined data exchanges. Hence, either when having an embarrassingly parallel topology or running a batch job, Flink should not restart the whole job in case of a task failure.

However, in the case of side outputs, I think they are connected via pipelined data exchanges with the main stream and, hence, are part of the same failover region as the main stream.


Cheers,
Till

On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick <[hidden email]> wrote:

Hi all,

 

We are trying to setup regions to enable Flink to only stop failing tasks based on region instead of failing the entire stream.

We are using one main stream that is reading from a kafka topic and a bunch of side outputs for processing each event from that topic differently.

For the processing in the side outputs we use the process function provided by flink.

 

So far when one side output stream failed, the whole stream job failed.

 

Is there anything that needs to be done or set on the Side Outputs so that Flink recognizes them as regions?

Is it even possible to have Flink handle side outputs as regions and restart only one specific side output stream on failure?

 

Many thanks in advance!

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to setup Regions for Fault Tolerance in Flink when using Side Outputs

Eifler, Patrick

Hi Till,

 

Thanks for your reply.

 

Is there any option to disconnect the side outputs from the pipelined data exchanges of the main stream.

 

The benefit of side outputs is very high regarding performance and useability plus it fits the use case here very nicely. Though this pipelined connection to the main stream is a real concern as all of the streams of the job are cancelled if one fails. With many side outputs that is not really an option we can maintain at scale.

 

So we are looking for options on how to preserve the side outputs but still get the individual region based failover to work.

 

The other thing I have noticed is that when using side outputs is that when looking at the Flink Web UI those streams are just named Unregistered_DataStream_. Setting up .name and .uid on the sideoutput stream does not change this. Is there any way on naming the side output streams so the customized name appears in the Flink Web UI?

 

Any help or tips are highly appreciated.

 

Many thanks in advance.

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]

 

From: Till Rohrmann <[hidden email]>
Date: Tuesday, 24. November 2020 at 17:20
To: "Eifler, Patrick" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: How to setup Regions for Fault Tolerance in Flink when using Side Outputs

 

Hi Patrick,

 

Flink supports regional failover [1] which only restarts all tasks connected via pipelined data exchanges. Hence, either when having an embarrassingly parallel topology or running a batch job, Flink should not restart the whole job in case of a task failure.

 

However, in the case of side outputs, I think they are connected via pipelined data exchanges with the main stream and, hence, are part of the same failover region as the main stream.

 

 

Cheers,

Till

 

On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick <[hidden email]> wrote:

Hi all,

 

We are trying to setup regions to enable Flink to only stop failing tasks based on region instead of failing the entire stream.

We are using one main stream that is reading from a kafka topic and a bunch of side outputs for processing each event from that topic differently.

For the processing in the side outputs we use the process function provided by flink.

 

So far when one side output stream failed, the whole stream job failed.

 

Is there anything that needs to be done or set on the Side Outputs so that Flink recognizes them as regions?

Is it even possible to have Flink handle side outputs as regions and restart only one specific side output stream on failure?

 

Many thanks in advance!

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to setup Regions for Fault Tolerance in Flink when using Side Outputs

Till Rohrmann
Hi Patrick,

at the moment it is not possible to disconnect side outputs from other streaming operators. I guess what you would like to have is an operator which consumes on a best effort basis but which can also lose some data while it is being restarted. This is currently not supported by Flink.

Concerning the web UI problems, I would suggest creating a JIRA ticket [1] because it sounds like a missing feature.


Cheers,
Till

On Wed, Nov 25, 2020 at 11:15 AM Eifler, Patrick <[hidden email]> wrote:

Hi Till,

 

Thanks for your reply.

 

Is there any option to disconnect the side outputs from the pipelined data exchanges of the main stream.

 

The benefit of side outputs is very high regarding performance and useability plus it fits the use case here very nicely. Though this pipelined connection to the main stream is a real concern as all of the streams of the job are cancelled if one fails. With many side outputs that is not really an option we can maintain at scale.

 

So we are looking for options on how to preserve the side outputs but still get the individual region based failover to work.

 

The other thing I have noticed is that when using side outputs is that when looking at the Flink Web UI those streams are just named Unregistered_DataStream_. Setting up .name and .uid on the sideoutput stream does not change this. Is there any way on naming the side output streams so the customized name appears in the Flink Web UI?

 

Any help or tips are highly appreciated.

 

Many thanks in advance.

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]

 

From: Till Rohrmann <[hidden email]>
Date: Tuesday, 24. November 2020 at 17:20
To: "Eifler, Patrick" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: How to setup Regions for Fault Tolerance in Flink when using Side Outputs

 

Hi Patrick,

 

Flink supports regional failover [1] which only restarts all tasks connected via pipelined data exchanges. Hence, either when having an embarrassingly parallel topology or running a batch job, Flink should not restart the whole job in case of a task failure.

 

However, in the case of side outputs, I think they are connected via pipelined data exchanges with the main stream and, hence, are part of the same failover region as the main stream.

 

 

Cheers,

Till

 

On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick <[hidden email]> wrote:

Hi all,

 

We are trying to setup regions to enable Flink to only stop failing tasks based on region instead of failing the entire stream.

We are using one main stream that is reading from a kafka topic and a bunch of side outputs for processing each event from that topic differently.

For the processing in the side outputs we use the process function provided by flink.

 

So far when one side output stream failed, the whole stream job failed.

 

Is there anything that needs to be done or set on the Side Outputs so that Flink recognizes them as regions?

Is it even possible to have Flink handle side outputs as regions and restart only one specific side output stream on failure?

 

Many thanks in advance!

 

Cheers,

 

Patrick

-- 

Patrick Eifler

 

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure 
Sony Interactive Entertainment LLC 

Wilhelmstraße 118, 10963 Berlin


Germany

E: [hidden email]