Check pointing for simple pipeline

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Check pointing for simple pipeline

Prasanna kumar
Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink.. 

Both source and sink are Kafka. 

What is the best checkpoint ing mechanism?

 Is setting checkpoints incremental a good option? What should be careful of? 

I am running it on aws emr.

Will checkpoint slow the speed? 

Thanks,
Prasanna.
Reply | Threaded
Open this post in threaded view
|

Re: Check pointing for simple pipeline

Yun Tang
Hi Prasanna

Using incremental checkpoint is always better than not as this is faster and less memory consumed.
However, incremental checkpoint is only supported by RocksDB state-backend.


Best
Yun Tang

From: Prasanna kumar <[hidden email]>
Sent: Tuesday, July 7, 2020 20:43
To: [hidden email] <[hidden email]>; user <[hidden email]>
Subject: Check pointing for simple pipeline
 
Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink.. 

Both source and sink are Kafka. 

What is the best checkpoint ing mechanism?

 Is setting checkpoints incremental a good option? What should be careful of? 

I am running it on aws emr.

Will checkpoint slow the speed? 

Thanks,
Prasanna.
Reply | Threaded
Open this post in threaded view
|

Re: Check pointing for simple pipeline

Dawid Wysakowicz-2

Hi Prasanna,

I'd like to add my two cents here. I would not say using the incremental checkpoint is always the best choice. It might have its downsides when restoring from the checkpoint as it will have to apply all the deltas. Therefore restoring from a non-incremental checkpoint might be faster.


As Yun Tang, mentioned the incremental checkpoints are supported by RocksDB only. You don't necessarily need the RocksDB state backend in all cases. If you are sure that the state will fit into the memory (it is probably the case for such a simple job, especially if the map function is stateless), you should be good with the Filesystem state backend[1]. This state backend should be faster as it does not need to spill anything to disk and keeps everything in a deserialized form during the runtime.


You might also find this short post[2] helpful.


Best,

Dawid


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/state_backends.html#the-fsstatebackend

[2] https://www.ververica.com/blog/stateful-stream-processing-apache-flink-state-backends


On 08/07/2020 05:25, Yun Tang wrote:
Hi Prasanna

Using incremental checkpoint is always better than not as this is faster and less memory consumed.
However, incremental checkpoint is only supported by RocksDB state-backend.


Best
Yun Tang

From: Prasanna kumar [hidden email]
Sent: Tuesday, July 7, 2020 20:43
To: [hidden email] [hidden email]; user [hidden email]
Subject: Check pointing for simple pipeline
 
Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink.. 

Both source and sink are Kafka. 

What is the best checkpoint ing mechanism?

 Is setting checkpoints incremental a good option? What should be careful of? 

I am running it on aws emr.

Will checkpoint slow the speed? 

Thanks,
Prasanna.

signature.asc (849 bytes) Download Attachment