Byte arrays in Avro

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Byte arrays in Avro

Lasse Nedergaard-2
Hi.

We have some Avro objects and some of them contain the primitive data type bytes and it's translated into java.nio.ByteBuffer in the Avro objects. When using our Avro object we get these warnings:

org.apache.flink.api.java.typeutils.TypeExtractor [] - class java.nio.ByteBuffer does not contain a getter for field hb
org.apache.flink.api.java.typeutils.TypeExtractor [] - class java.nio.ByteBuffer does not contain a setter for field hb
org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class java.nio.ByteBuffer cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.

and it's correct that ByteBuffer doesn't contain getter and setter for hb.

Flink documentation said "Note that Flink is automatically serializing POJOs generated by Avro with the Avro serializer.", but when I debug it looks like it fails back to generic type for the byte buffer and it therefore make sense with the warnings.

I want to ensure we are running as effectively as possible.

So my questions are:
1. What is the most optimal way to transport byte arrays in Avro in Flink.
2. Do Flink use Avro serializer for our Avro object when they contain ByteBuffer?


Thanks

Lasse Nedergaard
Reply | Threaded
Open this post in threaded view
|

Re: Byte arrays in Avro

Timo Walther
Hi Lasse,

are you using Avro specific records? A look into the code shows that the
warnings in the log are generated after the Avro check:

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741

Somehow your Avro object is not recognized correctly?

Regards,
Timo

On 16.07.20 13:28, Lasse Nedergaard wrote:

> Hi.
>
> We have some Avro objects and some of them contain the primitive data
> type bytes and it's translated into java.nio.ByteBuffer in the Avro
> objects. When using our Avro object we get these warnings:
>
> org.apache.flink.api.java.typeutils.TypeExtractor [] - class
> java.nio.ByteBuffer does not contain a getter for field hb
> org.apache.flink.api.java.typeutils.TypeExtractor [] - class
> java.nio.ByteBuffer does not contain a setter for field hb
> org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class
> java.nio.ByteBuffer cannot be used as a POJO type because not all fields
> are valid POJO fields, and must be processed as GenericType. Please read
> the Flink documentation on "Data Types & Serialization" for details of
> the effect on performance.
>
> and it's correct that ByteBuffer doesn't contain getter and setter for hb.
>
> Flink documentation said "Note that Flink is automatically serializing
> POJOs generated by Avro with the Avro serializer.", but when I debug it
> looks like it fails back to generic type for the byte buffer and it
> therefore make sense with the warnings.
>
> I want to ensure we are running as effectively as possible.
>
> So my questions are:
> 1. What is the most optimal way to transport byte arrays in Avro in Flink.
> 2. Do Flink use Avro serializer for our Avro object when they contain
> ByteBuffer?
>
>
> Thanks
>
> Lasse Nedergaard

Reply | Threaded
Open this post in threaded view
|

Re: Byte arrays in Avro

Timo Walther
I further investigated this issue. We are analyzing the class as a POJO
in another step here which produces the warning:

https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroTypeInfo.java#L71

However, the serializer is definitely the `AvroSerializer` if the type
information is `AvroTypeInfo`. You can check that via `dataStream.getType`.

I hope this helps.

Regards,
Timo

On 16.07.20 14:28, Timo Walther wrote:

> Hi Lasse,
>
> are you using Avro specific records? A look into the code shows that the
> warnings in the log are generated after the Avro check:
>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741 
>
>
> Somehow your Avro object is not recognized correctly?
>
> Regards,
> Timo
>
> On 16.07.20 13:28, Lasse Nedergaard wrote:
>> Hi.
>>
>> We have some Avro objects and some of them contain the primitive data
>> type bytes and it's translated into java.nio.ByteBuffer in the Avro
>> objects. When using our Avro object we get these warnings:
>>
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - class
>> java.nio.ByteBuffer does not contain a getter for field hb
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - class
>> java.nio.ByteBuffer does not contain a setter for field hb
>> org.apache.flink.api.java.typeutils.TypeExtractor [] - Class class
>> java.nio.ByteBuffer cannot be used as a POJO type because not all
>> fields are valid POJO fields, and must be processed as GenericType.
>> Please read the Flink documentation on "Data Types & Serialization"
>> for details of the effect on performance.
>>
>> and it's correct that ByteBuffer doesn't contain getter and setter for
>> hb.
>>
>> Flink documentation said "Note that Flink is automatically serializing
>> POJOs generated by Avro with the Avro serializer.", but when I debug
>> it looks like it fails back to generic type for the byte buffer and it
>> therefore make sense with the warnings.
>>
>> I want to ensure we are running as effectively as possible.
>>
>> So my questions are:
>> 1. What is the most optimal way to transport byte arrays in Avro in
>> Flink.
>> 2. Do Flink use Avro serializer for our Avro object when they contain
>> ByteBuffer?
>>
>>
>> Thanks
>>
>> Lasse Nedergaard
>