Tuple model project

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Tuple model project

Flavio Pompermaier
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Fabian Hueske-2
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Flavio Pompermaier
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Flavio Pompermaier
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Stephan Ewen
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio




Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Stephan Ewen
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio





Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Flavio Pompermaier
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio






Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Till Rohrmann

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:

def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}

Cheers,
Till


On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio







Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Flavio Pompermaier
How can I create a Flink dataset given a directory path that contains a set of java objects serialized with kryo (one file per object)?

On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <[hidden email]> wrote:

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:

def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}

Cheers,
Till


On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio









Reply | Threaded
Open this post in threaded view
|

Re: Tuple model project

Till Rohrmann-2

You could try to use the TypeSerializerInputFormat.


On Thu, Jul 30, 2015 at 2:08 PM, Flavio Pompermaier <[hidden email]> wrote:
How can I create a Flink dataset given a directory path that contains a set of java objects serialized with kryo (one file per object)?

On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <[hidden email]> wrote:

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:

def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}

Cheers,
Till


On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio