Hi, Can we define custom sources in link? Control the barriers and (thus) checkpoints at good watermark points? -Abhishek-
|
Hi Abhishek, you can implement custom sources by implementing the At the moment, it is not possible to control manually or from a source function when to trigger a checkpoint. This is the responsibility of the Cheers, On Tue, May 17, 2016 at 8:28 PM, Abhishek R. Singh <[hidden email]> wrote:
|
Thanks - appreciate the response.
The reason I want to control these things is this - my state grows and shrinks over time (user level windowing as application state). I would like to trigger checkpoints just after the state has been crunched/compressed (at the window boundary). Say I crunch every 10 seconds and slide the window by 8 seconds (2 second overlap). My window buffers would only need to checkpoint 2s worth of in-flight data (apart from compressed state for the other 8 seconds). With flink this seems hard given the windowing is at a partition level and not a global window. Even if I use event time, every partition will be at different points (of when that partition becomes ready to crunch). OTOH, if I were to introduce barriers at source, I could ensure that I get a good point globally to crunch and checkpoint my state. Does the checkpoint co-ordinator provide triggers to application to “crunch now" and reduce state? BTW, this may not be optimal, because applications would have natural triggers to crunch windows (and can’t just react to these triggers at random points). Is there any benefit of allowing flink to do the windowing (in terms of getting smaller checkpoints)? I read that flink does not checkpoint in-flight data, but this would be impossible with event time and out of order processing by operators (I can see how it would work with processing time, and in order crunching). When the barrier hits the operator, flink will have to checkpoint all active event time windows. Given these event time windows have different trigger points, it might help to checkpoint right after trigger evaluation so the state is compressed (and there is less things to checkpoint). There seems to be some relationship between watermarks, triggers and checkpoint that is someone not being leveraged. -Abhishek-
|
On Thu, May 19, 2016 at 7:48 PM, Abhishek R. Singh
<[hidden email]> wrote: > There seems to be some relationship between watermarks, triggers and > checkpoint that is someone not being leveraged. Checkpointing is independent of this, yes. Did the state size become a problem for your use case? There are various users running Flink with very large state sizes without any issues. The recommended state backend for these use cases is the RocksDB backend. The barriers are triggered at the sources and flow with the data (https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/stream_checkpointing.html). Everything in-flight after the barrier is not relevant for the checkpoint. We are only interested in a consistent state snapshot. |
Thanks. I am still in theory/evaluation mode. Will try to code this up to see if checkpoint will become an issue. I do have a high rate of ingest and lots of in flight data. Hopefully flink back pressure keeps this nicely bounded.
I doubt it will be a problem for me - because even spark is writing all in-flight data to disk - because all partitioning goes thru disk and is inline - ie sync. Flink disk usage is write only and for failure case only. Looks pretty compelling so far. On Friday, May 20, 2016, Ufuk Celebi <[hidden email]> wrote: On Thu, May 19, 2016 at 7:48 PM, Abhishek R. Singh |
On Fri, May 20, 2016 at 3:12 PM, Abhishek Singh
<[hidden email]> wrote: > Thanks. I am still in theory/evaluation mode. Will try to code this up to > see if checkpoint will become an issue. I do have a high rate of ingest and > lots of in flight data. Hopefully flink back pressure keeps this nicely > bounded. Sure! Feel free to post any questions that you have during you evaluation. If you are interested in how back pressure is propagated you can have at this blog post here: http://data-artisans.com/how-flink-handles-backpressure/ |
Free forum by Nabble | Edit this page |