Kafka and Flink integration

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Kafka and Flink integration

Stephan Ewen
Hi Urs!

Inside Flink (between Flink operators)
  - Kryo is not a problem, but types must be registered up front for good performance
  - Tuples and POJOs are often faster than the types that fall back to Kryo

Persistent-storage (HDFS, Kafka, ...)
  - Kryo is not recommended, because its binary data format is not stable. It changes between major Kryo versions and between Kryo setups with different type registrations.
  - A stable format with schema evolution support (Avro, Thrift, ...) is recommended here.



On Thu, Jun 22, 2017 at 9:28 AM, Urs Schoenenberger <[hidden email]> wrote:
Hi Greg,

do you have a link where I could read up on the rationale behind
avoiding Kryo? I'm currently facing a similar decision and would like to
get some more background on this.

Thank you very much,
Urs

On 21.06.2017 12:10, Greg Hogan wrote:
> The recommendation has been to avoid Kryo where possible.
>
> General data exchange: avro or thrift.
>
> Flink internal data exchange: POJO (or Tuple, which are slightly faster though less readable, and there is an outstanding PR to narrow or close the performance gap).
>
> Kryo is useful for types which cannot be modified to be a POJO. There are also cases where Kryo must be used because Flink has insufficient TypeInformation, such as when returning an interface or abstract type when the actual concrete type can be known.
>
>
>
>> On Jun 21, 2017, at 3:19 AM, nragon <[hidden email]> wrote:
>>
>> So, serialization between producer application -> kafka -> flink kafka
>> consumer will use avro, thrift or kryo right? From there, the remaining
>> pipeline can just use standard pojo serialization, which would be better?
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13885.html
>> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
>

--
Urs Schönenberger - [hidden email]

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Reply | Threaded
Open this post in threaded view
|

Re: Kafka and Flink integration

Jürgen Thomann

Hi Stephan,

do you know an easy way to find out if Kryo or POJO is used? I have an Object that would be a POJO, but it has one field that uses an object without a public no argument constructor. As I understood the documentation, this should result in Kryo being used.

Thanks,
Jürgen

On 03.07.2017 17:18, Stephan Ewen wrote:
Hi Urs!

Inside Flink (between Flink operators)
  - Kryo is not a problem, but types must be registered up front for good performance
  - Tuples and POJOs are often faster than the types that fall back to Kryo

Persistent-storage (HDFS, Kafka, ...)
  - Kryo is not recommended, because its binary data format is not stable. It changes between major Kryo versions and between Kryo setups with different type registrations.
  - A stable format with schema evolution support (Avro, Thrift, ...) is recommended here.



On Thu, Jun 22, 2017 at 9:28 AM, Urs Schoenenberger <[hidden email]> wrote:
Hi Greg,

do you have a link where I could read up on the rationale behind
avoiding Kryo? I'm currently facing a similar decision and would like to
get some more background on this.

Thank you very much,
Urs

On 21.06.2017 12:10, Greg Hogan wrote:
> The recommendation has been to avoid Kryo where possible.
>
> General data exchange: avro or thrift.
>
> Flink internal data exchange: POJO (or Tuple, which are slightly faster though less readable, and there is an outstanding PR to narrow or close the performance gap).
>
> Kryo is useful for types which cannot be modified to be a POJO. There are also cases where Kryo must be used because Flink has insufficient TypeInformation, such as when returning an interface or abstract type when the actual concrete type can be known.
>
>
>
>> On Jun 21, 2017, at 3:19 AM, nragon <[hidden email]> wrote:
>>
>> So, serialization between producer application -> kafka -> flink kafka
>> consumer will use avro, thrift or kryo right? From there, the remaining
>> pipeline can just use standard pojo serialization, which would be better?
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13885.html
>> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
>

--
Urs Schönenberger - [hidden email]

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082


-- 
Jürgen Thomann
System Administrator


InnoGames GmbH
Friesenstraße 13 - 20097 Hamburg - Germany
Tel +49 40 7889335-0

Managing Directors: Hendrik Klindworth, Michael Zillmer
VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973

http://www.innogames.com[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Kafka and Flink integration

snntr
Hi Jürgen,

one easy way is to disable the Kryo fallback with

env.getConfig().disableGenericTypes();

If it was using Kryo you should see an exception, which also states the
class, for which it needs to fallback to Kryo. This fails on the first
non-Kryo class though. So depending on the other classes, you might need
a minimal job using the Class to test it this way.

Best,

Konstantin


On 05.07.2017 10:35, Jürgen Thomann wrote:

> Hi Stephan,
>
> do you know an easy way to find out if Kryo or POJO is used? I have an
> Object that would be a POJO, but it has one field that uses an object
> without a public no argument constructor. As I understood the
> documentation, this should result in Kryo being used.
>
> Thanks,
> Jürgen
>
> On 03.07.2017 17:18, Stephan Ewen wrote:
>> Hi Urs!
>>
>> Inside Flink (between Flink operators)
>>   - Kryo is not a problem, but types must be registered up front for
>> good performance
>>   - Tuples and POJOs are often faster than the types that fall back to
>> Kryo
>>
>> Persistent-storage (HDFS, Kafka, ...)
>>   - Kryo is not recommended, because its binary data format is not
>> stable. It changes between major Kryo versions and between Kryo setups
>> with different type registrations.
>>   - A stable format with schema evolution support (Avro, Thrift, ...)
>> is recommended here.
>>
>>
>>
>> On Thu, Jun 22, 2017 at 9:28 AM, Urs Schoenenberger
>> <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Greg,
>>
>>     do you have a link where I could read up on the rationale behind
>>     avoiding Kryo? I'm currently facing a similar decision and would
>>     like to
>>     get some more background on this.
>>
>>     Thank you very much,
>>     Urs
>>
>>     On 21.06.2017 12:10, Greg Hogan wrote:
>>     > The recommendation has been to avoid Kryo where possible.
>>     >
>>     > General data exchange: avro or thrift.
>>     >
>>     > Flink internal data exchange: POJO (or Tuple, which are slightly
>>     faster though less readable, and there is an outstanding PR to
>>     narrow or close the performance gap).
>>     >
>>     > Kryo is useful for types which cannot be modified to be a POJO.
>>     There are also cases where Kryo must be used because Flink has
>>     insufficient TypeInformation, such as when returning an interface
>>     or abstract type when the actual concrete type can be known.
>>     >
>>     >
>>     >
>>     >> On Jun 21, 2017, at 3:19 AM, nragon
>>     <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>     >>
>>     >> So, serialization between producer application -> kafka ->
>>     flink kafka
>>     >> consumer will use avro, thrift or kryo right? From there, the
>>     remaining
>>     >> pipeline can just use standard pojo serialization, which would
>>     be better?
>>     >>
>>     >>
>>     >>
>>     >> --
>>     >> View this message in context:
>>     http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13885.html
>>     <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13885.html>
>>     >> Sent from the Apache Flink User Mailing List archive. mailing
>>     list archive at Nabble.com.
>>     >
>>
>>     --
>>     Urs Schönenberger - [hidden email]
>>     <mailto:[hidden email]>
>>
>>     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>     Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>>
>
> --
> Jürgen Thomann
> System Administrator
>
>
> InnoGames GmbH
> Friesenstraße 13 - 20097 Hamburg - Germany
> Tel +49 40 7889335-0
>
> Managing Directors: Hendrik Klindworth, Michael Zillmer
> VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973
>
> http://www.innogames.com – [hidden email]
>
--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082


signature.asc (849 bytes) Download Attachment
12