Hi ,
I have pipeline. Source-> Map(JSON transform)-> Sink.. Both source and sink are Kafka. What is the best checkpoint ing mechanism? Is setting checkpoints incremental a good option? What should be careful of? I am running it on aws emr. Will checkpoint slow the speed? Thanks, Prasanna. |
Hi Prasanna
Using incremental checkpoint is always better than not as this is faster and less memory consumed.
However, incremental checkpoint is only supported by RocksDB state-backend.
Best
Yun Tang
From: Prasanna kumar <[hidden email]>
Sent: Tuesday, July 7, 2020 20:43 To: [hidden email] <[hidden email]>; user <[hidden email]> Subject: Check pointing for simple pipeline Hi ,
I have pipeline. Source-> Map(JSON transform)-> Sink..
Both source and sink are Kafka.
What is the best checkpoint ing mechanism?
Is setting checkpoints incremental a good option? What should be careful of?
I am running it on aws emr.
Will checkpoint slow the speed?
Thanks,
Prasanna.
|
Hi Prasanna, I'd like to add my two cents here. I would not say using the incremental checkpoint is always the best choice. It might have its downsides when restoring from the checkpoint as it will have to apply all the deltas. Therefore restoring from a non-incremental checkpoint might be faster.
As Yun Tang, mentioned the incremental checkpoints are supported by RocksDB only. You don't necessarily need the RocksDB state backend in all cases. If you are sure that the state will fit into the memory (it is probably the case for such a simple job, especially if the map function is stateless), you should be good with the Filesystem state backend[1]. This state backend should be faster as it does not need to spill anything to disk and keeps everything in a deserialized form during the runtime.
You might also find this short post[2] helpful.
Best, Dawid
[2] https://www.ververica.com/blog/stateful-stream-processing-apache-flink-state-backends
On 08/07/2020 05:25, Yun Tang wrote:
signature.asc (849 bytes) Download Attachment |
Free forum by Nabble | Edit this page |