Hi Mailing List, according to my questions (and your answers!) at this topic I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial results. My mapper has the following layout: ArrayList<ArrayList<Tuple> structure = … For (Tuple tuple : input) { addTupleToStructure() } While(WorkNotDone) { doSomeWorkOnStructure() emitPartialResult(); } Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type") I do now iterate through this ArrayList<T> and emit each Tuple as
Tuple2.of(Integer.valueOf(this.partitionIndex),
tuple))); While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked and can’t continue to doSomeWorkOnStructure().
So I have three questions:
-
If I change back to emitting the ArrayList<T> would my Mapper also be blocked until Flink has serialized this ArrayList<T>? Or is Serialization done independent from my Mapper?
-
If emitting the ArrayList<T> won’t block my Mapper, which variant would be more performant?
-
If I emit ArrayList<T>, but additionally implement a combiner, which
o
Merges all local ArrayLists<T> with the same partitionIndex
o
Iterates through the local-merged ArrayLists<T> and emits the containing tuples would that be the best variant? Because the combining is done locally, I would assume that no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not blocked
with emitting tuples and can continue doSomeWorkOnStructure() Thank you in advance! Robert |
Hi again, I implemented the scenario with the combiner and answered my 3rd question by myself: The combiners start after the mapper finished, so the reducer will not start processing partial results until the mappers are completely done. Regards Robert Von: Paschek, Robert [mailto:[hidden email]]
Hi Mailing List, according to my questions (and your answers!) at this topic I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial results. My mapper has the following layout: ArrayList<ArrayList<Tuple> structure = … For (Tuple tuple : input) { addTupleToStructure() } While(WorkNotDone) { doSomeWorkOnStructure() emitPartialResult(); } Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type") I do now iterate through this ArrayList<T> and emit each Tuple as
Tuple2.of(Integer.valueOf(this.partitionIndex),
tuple))); While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked and can’t continue to doSomeWorkOnStructure().
So I have three questions:
-
If I change back to emitting the ArrayList<T> would my Mapper also be blocked until Flink has serialized this ArrayList<T>? Or is Serialization done independent from my Mapper?
-
If emitting the ArrayList<T> won’t block my Mapper, which variant would be more performant?
-
If I emit ArrayList<T>, but additionally implement a combiner, which
o
Merges all local ArrayLists<T> with the same partitionIndex
o
Iterates through the local-merged ArrayLists<T> and emits the containing tuples would that be the best variant? Because the combining is done locally, I would assume that no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not blocked
with emitting tuples and can continue doSomeWorkOnStructure() Thank you in advance! Robert |
Free forum by Nabble | Edit this page |