How to test flink job recover from checkpoint

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to test flink job recover from checkpoint

Eleanore Jin
Hi, 

I have a flink application and checkpoint is enabled, I am running locally using miniCluster. 

I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint?

Thanks a lot!
Eleanore
Reply | Threaded
Open this post in threaded view
|

Re: How to test flink job recover from checkpoint

Zhu Zhu
Hi Eleanore,

You can change your application tasks to throw exceptions in a certain frequency. 
Alternatively, if the application has external dependencies (e.g. source), you can trigger failures manually by manipulating the status of the external service (e.g. shutdown the source service, or break the network connection between the Flink app and the source service).

Thanks,
Zhu Zhu

Eleanore Jin <[hidden email]> 于2020年3月5日周四 上午8:40写道:
Hi, 

I have a flink application and checkpoint is enabled, I am running locally using miniCluster. 

I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint?

Thanks a lot!
Eleanore
Reply | Threaded
Open this post in threaded view
|

Re: How to test flink job recover from checkpoint

Bajaj, Abhinav
In reply to this post by Eleanore Jin

I implemented a custom function that throws up a runtime exception.

 

You can extend from simpler MapFunction or more complicated RichParallelSourceFunction depending on your use case.

You can add logic to throw a runtime exception on a certain condition in the map or run method.               .

You can use a count or timer to trigger the exception.

 

Sharing a quick handwritten example.

 

DataStream<String> stream = .....

DataStream<String> mappedStream = stream.map(new MapFunction<String, String>>() {

          @Override

          public String map(String value) throws Exception {

            if (SOME_CONDITION) {

              throw new RuntimeException("Lets test checkpointing");

            }

                return value;  

          }

});

 

~ Abhinav Bajaj

               

 

From: Eleanore Jin <[hidden email]>
Date: Wednesday, March 4, 2020 at 4:40 PM
To: user <[hidden email]>, user-zh <[hidden email]>
Subject: How to test flink job recover from checkpoint

 

Hi, 

 

I have a flink application and checkpoint is enabled, I am running locally using miniCluster. 

 

I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint?

 

Thanks a lot!

Eleanore

Reply | Threaded
Open this post in threaded view
|

Re: How to test flink job recover from checkpoint

Eleanore Jin
Hi Zhu Zhu and Abhinav, 

I am able to verify the recovery from checkpoint based on your suggestions, thanks a lot for the help!
Eleanore

On Wed, Mar 4, 2020 at 5:40 PM Bajaj, Abhinav <[hidden email]> wrote:

I implemented a custom function that throws up a runtime exception.

 

You can extend from simpler MapFunction or more complicated RichParallelSourceFunction depending on your use case.

You can add logic to throw a runtime exception on a certain condition in the map or run method.               .

You can use a count or timer to trigger the exception.

 

Sharing a quick handwritten example.

 

DataStream<String> stream = .....

DataStream<String> mappedStream = stream.map(new MapFunction<String, String>>() {

          @Override

          public String map(String value) throws Exception {

            if (SOME_CONDITION) {

              throw new RuntimeException("Lets test checkpointing");

            }

                return value;  

          }

});

 

~ Abhinav Bajaj

               

 

From: Eleanore Jin <[hidden email]>
Date: Wednesday, March 4, 2020 at 4:40 PM
To: user <[hidden email]>, user-zh <[hidden email]>
Subject: How to test flink job recover from checkpoint

 

Hi, 

 

I have a flink application and checkpoint is enabled, I am running locally using miniCluster. 

 

I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint?

 

Thanks a lot!

Eleanore