Custom Serializers

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom Serializers

nragon
Hi,

First of all, great #FF17, really enjoyed it.
After attending some of the dataArtisans folks talks, realized that
serialization should be optimized if there is no way to use supported
objects.
In my case, users can configure their source in our application online which
gives them freedom to dynamically change the number and type of attributes.
Moreover, between operator the object can be changed in terms of number of
attributes.
Because of this, we have a legacy structure that I showed before.
Should  I implement my own TypeInformation, TypeComparator, TypeSerializer
and TypeInfoFactory?
Am I forgetting something?

Thanks,
Nuno




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Tzu-Li (Gordon) Tai
Hi Nuno,

Because of this, we have a legacy structure that I showed before. 

Could you probably include more information about this legacy structure you mentioned here in this mail thread? I couldn’t find any other reference to that. That could be helpful to understanding your use case more here.

- Gordon



On 15 September 2017 at 12:59:15 PM, nragon ([hidden email]) wrote:

Hi,

First of all, great #FF17, really enjoyed it.
After attending some of the dataArtisans folks talks, realized that
serialization should be optimized if there is no way to use supported
objects.
In my case, users can configure their source in our application online which
gives them freedom to dynamically change the number and type of attributes.
Moreover, between operator the object can be changed in terms of number of
attributes.
Because of this, we have a legacy structure that I showed before.
Should I implement my own TypeInformation, TypeComparator, TypeSerializer
and TypeInfoFactory?
Am I forgetting something?

Thanks,
Nuno




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
This post was updated on .
Sorry, I was discussing this with Stephan before posting it here.
Basically main wrapper holds an array with a custom object and because its
size can change thoughtout the stream and users can customize their sources
dynamically, it make it difficult to create a generic pojo or use tuple for
this purpose.
Basically I have a wrapper which extends an HashMap containing K,V ->
(attribute name, object). Each object has a set of attributes that we use to
identify it's type.
Now i'm using kryo because hashmap is not supported.
Thing is, I'll always have a dynamic array of objects which I'll have to
serialize.

Hope this helps

Thanks,
Nuno




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
Eventually I'll have a class named Element which holds an array of Parameter
Do I need typeinfo, comparator, factory and serializer for both of them?

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
Sorry for bringing this up, any tips on this?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Chesnay Schepler
If Parameters are always encapsulated in an Event, and the Event
serializer knows how to deal with them, then you
only need to implement a serializer etc. for the Event class.

On 18.09.2017 13:20, nragon wrote:
> Sorry for bringing this up, any tips on this?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Chesnay Schepler
you do need them, but only for the Event class.

On 18.09.2017 13:38, nragon wrote:
> So, no need for typeinfo, comparator or factory?
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
One other thing :). Can i set tuple generic type dynamically?
Meaning, build a tuple of N arity and build TupleSerializer based on those
types.
This because I'll only know these types based on user inputs.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Chesnay Schepler
Have a look at the TupleTypeInfo class. It has a constructor that
accepts an array of TypeInformation,
and supports automatically generating a serializer from them.

On 18.09.2017 18:28, nragon wrote:

> One other thing :). Can i set tuple generic type dynamically?
> Meaning, build a tuple of N arity and build TupleSerializer based on those
> types.
> This because I'll only know these types based on user inputs.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
This post was updated on .
createInstance(Object[] fields) and createOrReuseInstance(Object[] fields, T reuse) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks

UPDATE:
Can I extend Tuple and reuse current serializers, with some changes?


--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
This post was updated on .
Should I use TypeSerializerSingleton if the serializer is independent of the object which
it's serializing?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Chesnay Schepler
In reply to this post by nragon
On 19.09.2017 11:39, nragon wrote:
createInstance(Object[] fields) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks
Taken from TupleSerializerBase:
// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)
public abstract T createInstance(Object[] fields);

On 27.09.2017 17:43, nragon wrote:
Should I use TypeSerializerSingleton if it is independent of the object which
it's serializing?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind:

* The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful
* implementations of the methods may lead to unpredictable side effects and will compromise both stability and
* correctness of the program.

Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

Chesnay Schepler
In reply to this post by nragon
On 19.09.2017 11:39, nragon wrote:
createInstance(Object[] fields) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks
Taken from TupleSerializerBase:
// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)
public abstract T createInstance(Object[] fields);

On 27.09.2017 17:43, nragon wrote:
Should I use TypeSerializerSingleton if it is independent of the object which
it's serializing?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind:

* The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful
* implementations of the methods may lead to unpredictable side effects and will compromise both stability and
* correctness of the program.
Reply | Threaded
Open this post in threaded view
|

Re: Custom Serializers

nragon
This post was updated on .
Got it :)
I've redesign my object which I use across jobs.
Ended up with 4 serializers.
My object Element holds 2 fields, an array of Parameter and a Metadata.
Metadata holds an array of ParameterInfo and each Parameter holds it's
ParameterInfo (Kinda duplicate against Metadata but needed for legacy). So
Element has the TypeInfo and TypeInfoFactory and also serializers for
Parameter and Metadata. The others are just adaptation os
GenericArraySerializer<Parameter>, StringSerializer, ...
From my test I've manage to get around 40% improvement against serializing
Element, as is, with kryo.

P.S.: DataOutputSerializer and DataInputDeserializer also improved kafka integration

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/