(DEPRECATED) Apache Flink User Mailing List archive.

Tuple model project

Classic

List

Threaded

10 messages Options

Flavio Pompermaier

Tuple model project

Hi to all,

I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?

If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,

Flavio

Fabian Hueske-2

Re: Tuple model project

Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.

This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:

Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Flavio Pompermaier

Re: Tuple model project

Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:

Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Flavio Pompermaier

Re: Tuple model project

Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:

Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Stephan Ewen

Re: Tuple model project

Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:

Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Stephan Ewen

Re: Tuple model project

Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:

Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Flavio Pompermaier

Re: Tuple model project

I have a project that produce RDF quads and I have to store to read them with Flink afterwards.

I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.

Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:

Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Till Rohrmann

Re: Tuple model project

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:

def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}

Cheers,
Till

On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:

I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Flavio Pompermaier

Re: Tuple model project

How can I create a Flink dataset given a directory path that contains a set of java objects serialized with kryo (one file per object)?

On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <[hidden email]> wrote:

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:
def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}
Cheers,
Till
On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio

Till Rohrmann-2

Re: Tuple model project

You could try to use the TypeSerializerInputFormat.

On Thu, Jul 30, 2015 at 2:08 PM, Flavio Pompermaier <[hidden email]> wrote:

How can I create a Flink dataset given a directory path that contains a set of java objects serialized with kryo (one file per object)?
On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <[hidden email]> wrote:
Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the ExecutionEnvironment object. What you provide to the method is the type you want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type is not supported by Flink’s own serialization framework, then this custom serializer should be used. You register the types at the beginning of your Flink program:
def main(args: Array[String]): Unit = {
  val env = ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])

  ...

  env.execute()

}
Cheers,
Till
On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <[hidden email]> wrote:
I have a project that produce RDF quads and I have to store to read them with Flink afterwards.
I could use thrift/protobuf/avro but this means to add a lot of transitive dependencies to my project.
Maybe I could use Kryo to store those objects..is there any example to create a dataset of objects serialized with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Quick response: I am not opposed to that, but there are tuple libraries around already.

Do you need specifically the Flink tuples, for interoperability between Flink and other projects?

On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <[hidden email]> wrote:
Should we move this to the dev list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <[hidden email]> wrote:
Any thought about this (move tuples classes in a separate self-contained project with no transitive dependencies so that to be easily used in other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <[hidden email]> wrote:
Do you think it could be a good idea to extract Flink tuples in a separate project so that to allow simpler dependency management in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

at the moment, Tuples are more efficient than POJOs, because POJO fields are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers and comparators but I am not aware of any work in that direction.

Best, Fabian

2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I was thinking to write my own flink-compatible library and I need basically a Tuple5.

Is there any performace loss in using a POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate simple project (e.g. flink-java-tuples) that has no other dependency to enable other libs to write their flink-compatible logic without the need to exclude all the transitive dependency of flink-java?

Best,
Flavio