Hi all
I'm having a datastream transformation, that updates a mutable hashmap that exists outside of the stream. So it's something like object FlinkJob { val uriLookup = mutable.HashMap.empty[String, Int] def main(args: Array[String]) { val stream: DataStream = ... stream.keybBy(1).timeWindow(..).fold(..) .window(..) .map(..).fold(..) .addSink(..) } } where the uriLookup hashmap gets updated inside the stream transformation, and is serialized in the step before the addSink It works fine, however Does the snapshotting mechanism in case of a node failure actually serialize this map? And out of curiousity, can I actually see what data exists inside the snapshot data? Thanks. Bart |
Hi Bart, to make sure that the state is checkpointed you have to:
When your data is checkpointed you can access the state if your operator implements the `RichFunction` interface (via an abstract class that wraps the operator you need to implement, like `RichMapFunction`). For your need in particular, I don't know a way to checkpoint state shared between different operators; perhaps you can you refactor your code so that the state is encapsulated in an operator implementation and then moved through your pipeline as a parameter of the following operators. Would that work? I apologize for just providing pointers to the docs in my reply but checkpointing deserves a good explanation and I feel the docs get the job done pretty well. I will gladly help you if you have any doubt. Hope I've been of some help. On Thu, Apr 7, 2016 at 3:41 PM, Bart van Deenen <[hidden email]> wrote: Hi all BR, Stefano Baghino |
Hi, good explanation and pointers! I just want to add that the uriLookup table in your example is not really shared between your operator instances in a distributed setting. When serializing your transformations the current state of the HashMap is serialized with them because it is in the closure of the transformations. Then, on the cluster, the HashMap is serialized and all parallel instances basically work on their now local copy of the empty HashMap. Cheers, Aljoscha On Thu, 7 Apr 2016 at 18:30 Stefano Baghino <[hidden email]> wrote:
|
Thanks all!
I was under the mistaken impression that Flink automagically did the snapshotting for me.
The info is really clear, I'll have no trouble implementing it.
Bart
On Thu, Apr 7, 2016, at 18:40, Aljoscha Krettek wrote:
|
Free forum by Nabble | Edit this page |