(DEPRECATED) Apache Flink User Mailing List archive.

Custom Serializers

Classic

List

Threaded

16 messages Options

nragon

Custom Serializers

Hi,

First of all, great #FF17, really enjoyed it.
After attending some of the dataArtisans folks talks, realized that
serialization should be optimized if there is no way to use supported
objects.
In my case, users can configure their source in our application online which
gives them freedom to dynamically change the number and type of attributes.
Moreover, between operator the object can be changed in terms of number of
attributes.
Because of this, we have a legacy structure that I showed before.
Should I implement my own TypeInformation, TypeComparator, TypeSerializer
and TypeInfoFactory?
Am I forgetting something?

Thanks,
Nuno

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Tzu-Li (Gordon) Tai

Re: Custom Serializers

Hi Nuno,

Because of this, we have a legacy structure that I showed before.

Could you probably include more information about this legacy structure you mentioned here in this mail thread? I couldn’t find any other reference to that. That could be helpful to understanding your use case more here.

- Gordon

On 15 September 2017 at 12:59:15 PM, nragon ([hidden email]) wrote:

Hi,

First of all, great #FF17, really enjoyed it.
After attending some of the dataArtisans folks talks, realized that
serialization should be optimized if there is no way to use supported
objects.
In my case, users can configure their source in our application online which
gives them freedom to dynamically change the number and type of attributes.
Moreover, between operator the object can be changed in terms of number of
attributes.
Because of this, we have a legacy structure that I showed before.
Should I implement my own TypeInformation, TypeComparator, TypeSerializer
and TypeInfoFactory?
Am I forgetting something?

Thanks,
Nuno

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

nragon

Re: Custom Serializers

This post was updated on .

Sorry, I was discussing this with Stephan before posting it here.
Basically main wrapper holds an array with a custom object and because its
size can change thoughtout the stream and users can customize their sources
dynamically, it make it difficult to create a generic pojo or use tuple for
this purpose.
Basically I have a wrapper which extends an HashMap containing K,V ->
(attribute name, object). Each object has a set of attributes that we use to
identify it's type.
Now i'm using kryo because hashmap is not supported.
Thing is, I'll always have a dynamic array of objects which I'll have to
serialize.

Hope this helps

Thanks,
Nuno

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

nragon

Re: Custom Serializers

Eventually I'll have a class named Element which holds an array of Parameter
Do I need typeinfo, comparator, factory and serializer for both of them?

Thanks

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

nragon

Re: Custom Serializers

Sorry for bringing this up, any tips on this?

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: Custom Serializers

If Parameters are always encapsulated in an Event, and the Event
serializer knows how to deal with them, then you
only need to implement a serializer etc. for the Event class.

On 18.09.2017 13:20, nragon wrote:
> Sorry for bringing this up, any tips on this?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

nragon

Re: Custom Serializers

So, no need for typeinfo, comparator or factory?

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: Custom Serializers

you do need them, but only for the Event class.

On 18.09.2017 13:38, nragon wrote:
> So, no need for typeinfo, comparator or factory?
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

nragon

Re: Custom Serializers

Ok, got it.

Thanks

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

nragon

Re: Custom Serializers

One other thing :). Can i set tuple generic type dynamically?
Meaning, build a tuple of N arity and build TupleSerializer based on those
types.
This because I'll only know these types based on user inputs.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: Custom Serializers

Have a look at the TupleTypeInfo class. It has a constructor that
accepts an array of TypeInformation,
and supports automatically generating a serializer from them.

On 18.09.2017 18:28, nragon wrote:

> One other thing :). Can i set tuple generic type dynamically?
> Meaning, build a tuple of N arity and build TupleSerializer based on those
> types.
> This because I'll only know these types based on user inputs.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

nragon

Re: Custom Serializers

This post was updated on .

createInstance(Object[] fields) and createOrReuseInstance(Object[] fields, T reuse) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks

UPDATE:
Can I extend Tuple and reuse current serializers, with some changes?

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

nragon

Re: Custom Serializers

This post was updated on .

Should I use TypeSerializerSingleton if the serializer is independent of the object which
it's serializing?

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: Custom Serializers

In reply to this post by nragon

On 19.09.2017 11:39, nragon wrote:

createInstance(Object[] fields) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks

Taken from TupleSerializerBase:

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)
public abstract T createInstance(Object[] fields);

On 27.09.2017 17:43, nragon wrote:

Should I use TypeSerializerSingleton if it is independent of the object which
it's serializing?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind:

* The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful
* implementations of the methods may lead to unpredictable side effects and will compromise both stability and
* correctness of the program.

Chesnay Schepler

Re: Custom Serializers

In reply to this post by nragon

On 19.09.2017 11:39, nragon wrote:

createInstance(Object[] fields) at TupleSerializerBase seems not to be part
of TypeSerializer API.
Will I be loosing any functionality? In what cases do you use this instead
of createInstance()?

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)

Thanks

Taken from TupleSerializerBase:

// We use this in the Aggregate and Distinct Operators to create instances
// of immutable Tuples (i.e. Scala Tuples)
public abstract T createInstance(Object[] fields);

On 27.09.2017 17:43, nragon wrote:

Should I use TypeSerializerSingleton if it is independent of the object which
it's serializing?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Generally, use TypeSerializerSingleton. There is virtually no reason to not use it. Do keep this section of the TypeSerializer javadoc in mind:

* The methods in this class are assumed to be stateless, such that it is effectively thread safe. Stateful
* implementations of the methods may lead to unpredictable side effects and will compromise both stability and
* correctness of the program.

nragon

Re: Custom Serializers

This post was updated on .

Got it :)
I've redesign my object which I use across jobs.
Ended up with 4 serializers.
My object Element holds 2 fields, an array of Parameter and a Metadata.
Metadata holds an array of ParameterInfo and each Parameter holds it's
ParameterInfo (Kinda duplicate against Metadata but needed for legacy). So
Element has the TypeInfo and TypeInfoFactory and also serializers for
Parameter and Metadata. The others are just adaptation os
GenericArraySerializer<Parameter>, StringSerializer, ...
From my test I've manage to get around 40% improvement against serializing
Element, as is, with kryo.

P.S.: DataOutputSerializer and DataInputDeserializer also improved kafka integration

Thanks

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/