Serialization of "not a valid POJO type"

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Serialization of "not a valid POJO type"

Paschek, Robert

Hi Mailing List,

 

according to my questions (and your answers!) at this topic

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html

 

I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial results. My mapper has the following layout:

 

ArrayList<ArrayList<Tuple>  structure = …

 

For (Tuple tuple : input) {

                addTupleToStructure()

}

While(WorkNotDone) {

                doSomeWorkOnStructure()

                emitPartialResult();

}

 

Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type") I do now iterate through this ArrayList<T> and emit each Tuple as Tuple2.of(Integer.valueOf(this.partitionIndex), tuple)));

While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked and can’t continue to doSomeWorkOnStructure().

 

So I have three questions:

-          If I change back to emitting the ArrayList<T> would my Mapper also be blocked until Flink has serialized this ArrayList<T>? Or is Serialization done independent from my Mapper?

 

-          If emitting the ArrayList<T> won’t block my Mapper, which variant would be more performant?

 

-          If I emit ArrayList<T>, but additionally implement a combiner, which

o   Merges all local ArrayLists<T> with the same partitionIndex

o   Iterates through the local-merged ArrayLists<T> and emits the containing tuples

would that be the best variant? Because the combining is done locally, I would assume that no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not blocked with emitting tuples and can continue doSomeWorkOnStructure()

 

Thank you in advance!

Robert

 

 

Reply | Threaded
Open this post in threaded view
|

AW: Serialization of "not a valid POJO type"

Paschek, Robert

Hi again,

 

I implemented the scenario with the combiner and answered my 3rd question by myself:

The combiners start after the mapper finished, so the reducer will not start processing partial results until the mappers are completely done.

 

Regards

Robert

 

 

Von: Paschek, Robert [mailto:[hidden email]]
Gesendet: Samstag, 30. Juli 2016 12:04
An: [hidden email]
Betreff: Serialization of "not a valid POJO type"

 

Hi Mailing List,

 

according to my questions (and your answers!) at this topic

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html

 

I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial results. My mapper has the following layout:

 

ArrayList<ArrayList<Tuple>  structure = …

 

For (Tuple tuple : input) {

                addTupleToStructure()

}

While(WorkNotDone) {

                doSomeWorkOnStructure()

                emitPartialResult();

}

 

Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type") I do now iterate through this ArrayList<T> and emit each Tuple as Tuple2.of(Integer.valueOf(this.partitionIndex), tuple)));

While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked and can’t continue to doSomeWorkOnStructure().

 

So I have three questions:

-          If I change back to emitting the ArrayList<T> would my Mapper also be blocked until Flink has serialized this ArrayList<T>? Or is Serialization done independent from my Mapper?

 

-          If emitting the ArrayList<T> won’t block my Mapper, which variant would be more performant?

 

-          If I emit ArrayList<T>, but additionally implement a combiner, which

o   Merges all local ArrayLists<T> with the same partitionIndex

o   Iterates through the local-merged ArrayLists<T> and emits the containing tuples

would that be the best variant? Because the combining is done locally, I would assume that no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not blocked with emitting tuples and can continue doSomeWorkOnStructure()

 

Thank you in advance!

Robert