(DEPRECATED) Apache Flink User Mailing List archive.

AvroSchemaConverter and Tuple<T> classes

Classic

List

Threaded

5 messages Options

françois lacombe

AvroSchemaConverter and Tuple<T> classes

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");

Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :

{
"type": "record",
"fields": [
{ "name": "field1", "type": "int" },
{ "name": "field2", "type": "string"}

]}

= Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.

It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe

Timo Walther

Re: AvroSchemaConverter and Tuple<T> classes

Hi,

tuples are just a sub category of rows. Because the tuple arity is
limited to 25 fields. I think the easiest solution would be to write
your own converter that maps rows to tuples if you know that you will
not need more than 25 fields. Otherwise it might be easier to just use a
TextInputFormat and do the parsing yourself with a library.

Regards,
Timo

Am 23.08.18 um 18:54 schrieb françois lacombe:

> Hi all,
>
> I'm looking for best practices regarding Tuple<T> instances creation.
>
> I have a TypeInformation object produced by
> AvroSchemaConverter.convertToTypeInfo("{...}");
> Is this possible to define a corresponding Tuple<T> instance with it?
> (get the T from the TypeInformation)
>
> Example :
> {
> "type": "record",
> "fields": [
> { "name": "field1", "type": "int" },
> { "name": "field2", "type": "string"}
> ]}
> = Tuple2<Int,String>
>
> The same question rises with DataSet or other any record handling
> class with parametrized types.
>
> My goal is to parse several CsvFiles with different structures
> described in an Avro schema.
> It would be great to not hard-code structures in my Java code and only
> get types information at runtime from Avro schemas
>
> Is this possible?
>
> Thanks in advance
>
> François Lacombe

françois lacombe

Re: AvroSchemaConverter and Tuple<T> classes

Hi Timo,

Thanks for your answer

I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:

Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo

Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
"type": "record",
"fields": [
{ "name": "field1", "type": "int" },
{ "name": "field2", "type": "string"}
]}
= Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe

Rong Rong

Re: AvroSchemaConverter and Tuple<T> classes

Yes you should be able to use Row instead of Tuple in your BatchTableSink<T>.

There's sections in Flink documentation regarding mapping of data types to table schemas [1]. and table can be converted into various typed DataStream [2] as well. Hope these are helpful.

Thanks,

Rong

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#mapping-of-data-types-to-table-schema

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset

On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <[hidden email]> wrote:

Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:
Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo

Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
"type": "record",
"fields": [
{ "name": "field1", "type": "int" },
{ "name": "field2", "type": "string"}
]}
= Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe

françois lacombe

Re: AvroSchemaConverter and Tuple<T> classes

Thank you all for you answers.

It's ok with BatchTableSource<Row>

All the best

François

2018-08-26 17:40 GMT+02:00 Rong Rong <[hidden email]>:

Yes you should be able to use Row instead of Tuple in your BatchTableSink<T>.
There's sections in Flink documentation regarding mapping of data types to table schemas [1]. and table can be converted into various typed DataStream [2] as well. Hope these are helpful.

Thanks,
Rong

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#mapping-of-data-types-to-table-schema
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset

On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <[hidden email]> wrote:
Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:
Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo

Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
"type": "record",
"fields": [
{ "name": "field1", "type": "int" },
{ "name": "field2", "type": "string"}
]}
= Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe