Hi,
First of all, great #FF17, really enjoyed it. After attending some of the dataArtisans folks talks, realized that serialization should be optimized if there is no way to use supported objects. In my case, users can configure their source in our application online which gives them freedom to dynamically change the number and type of attributes. Moreover, between operator the object can be changed in terms of number of attributes. Because of this, we have a legacy structure that I showed before. Should I implement my own TypeInformation, TypeComparator, TypeSerializer and TypeInfoFactory? Am I forgetting something? Thanks, Nuno -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Nuno, Because of this, we have a legacy structure that I showed before. Could you probably include more information about this legacy structure you mentioned here in this mail thread? I couldn’t find any other reference to that. That could be helpful to understanding your use case more here. - Gordon On 15 September 2017 at 12:59:15 PM, nragon ([hidden email]) wrote:
|
This post was updated on .
Sorry, I was discussing this with Stephan before posting it here.
Basically main wrapper holds an array with a custom object and because its size can change thoughtout the stream and users can customize their sources dynamically, it make it difficult to create a generic pojo or use tuple for this purpose. Basically I have a wrapper which extends an HashMap containing K,V -> (attribute name, object). Each object has a set of attributes that we use to identify it's type. Now i'm using kryo because hashmap is not supported. Thing is, I'll always have a dynamic array of objects which I'll have to serialize. Hope this helps Thanks, Nuno -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Eventually I'll have a class named Element which holds an array of Parameter
Do I need typeinfo, comparator, factory and serializer for both of them? Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Sorry for bringing this up, any tips on this?
-- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
If Parameters are always encapsulated in an Event, and the Event
serializer knows how to deal with them, then you only need to implement a serializer etc. for the Event class. On 18.09.2017 13:20, nragon wrote: > Sorry for bringing this up, any tips on this? > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
So, no need for typeinfo, comparator or factory?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
you do need them, but only for the Event class.
On 18.09.2017 13:38, nragon wrote: > So, no need for typeinfo, comparator or factory? > > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
Ok, got it.
Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
One other thing :). Can i set tuple generic type dynamically?
Meaning, build a tuple of N arity and build TupleSerializer based on those types. This because I'll only know these types based on user inputs. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Have a look at the TupleTypeInfo class. It has a constructor that
accepts an array of TypeInformation, and supports automatically generating a serializer from them. On 18.09.2017 18:28, nragon wrote: > One other thing :). Can i set tuple generic type dynamically? > Meaning, build a tuple of N arity and build TupleSerializer based on those > types. > This because I'll only know these types based on user inputs. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
This post was updated on .
createInstance(Object[] fields) and createOrReuseInstance(Object[] fields, T reuse) at TupleSerializerBase seems not to be part
of TypeSerializer API. Will I be loosing any functionality? In what cases do you use this instead of createInstance()? // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e. Scala Tuples) Thanks UPDATE: Can I extend Tuple and reuse current serializers, with some changes? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
This post was updated on .
Should I use TypeSerializerSingleton if the serializer is independent of the object which
it's serializing? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In reply to this post by nragon
On 19.09.2017 11:39, nragon wrote:
Taken from TupleSerializerBase:createInstance(Object[] fields) at TupleSerializerBase seems not to be part of TypeSerializer API. Will I be loosing any functionality? In what cases do you use this instead of createInstance()? // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e. Scala Tuples) Thanks // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e. Scala Tuples) public abstract T createInstance(Object[] fields); On 27.09.2017 17:43, nragon wrote:
Should I use TypeSerializerSingleton if it is independent of the object which it's serializing? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind: * The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful * implementations of the methods may lead to unpredictable side effects and will compromise both stability and * correctness of the program. |
In reply to this post by nragon
On 19.09.2017 11:39, nragon wrote:
Taken from TupleSerializerBase:createInstance(Object[] fields) at TupleSerializerBase seems not to be part of TypeSerializer API. Will I be loosing any functionality? In what cases do you use this instead of createInstance()? // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e. Scala Tuples) Thanks // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e. Scala Tuples) public abstract T createInstance(Object[] fields); On 27.09.2017 17:43, nragon wrote:
Should I use TypeSerializerSingleton if it is independent of the object which it's serializing? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind: * The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful * implementations of the methods may lead to unpredictable side effects and will compromise both stability and * correctness of the program. |
This post was updated on .
Got it :)
I've redesign my object which I use across jobs. Ended up with 4 serializers. My object Element holds 2 fields, an array of Parameter and a Metadata. Metadata holds an array of ParameterInfo and each Parameter holds it's ParameterInfo (Kinda duplicate against Metadata but needed for legacy). So Element has the TypeInfo and TypeInfoFactory and also serializers for Parameter and Metadata. The others are just adaptation os GenericArraySerializer<Parameter>, StringSerializer, ... From my test I've manage to get around 40% improvement against serializing Element, as is, with kryo. P.S.: DataOutputSerializer and DataInputDeserializer also improved kafka integration Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |