Incremental aggregation using Fold and failure recovery
Posted by
hassahma on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Incremental-aggregation-using-Fold-and-failure-recovery-tp14026.html
Hi All,
I am collecting millions of events per hour for 'N' number of products where 'N' can be 50k. I use the following fold mechanism with sliding window:
final DataStream<WindowStats> eventStream = inputStream
.keyBy(TENANT, CATEGORY)
.window(SlidingProcessingTimeWindows.of(Time.hour(1,Time.minute(5)))
.fold(new WindowStats(), newProductAggregationMapper(), newProductAggregationWindowFunction());
In WindowStats class, I keep a map of HashMap<String productID, ProductMetric ProductMetric>>. So for 50k products I will have 50k entries in the map within WindowStats class.
My question is, if I set (env.enableCheckpointing(1000)), then the WindowStats instance for each existing window will automatically be checkpointed and restored on recovery? If not then how can I better a implement above usecase to store product metric using fold operation please?
Thanks for all the help.
Best Regards,