AvroSchemaConverter and Tuple<T> classes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

AvroSchemaConverter and Tuple<T> classes

françois lacombe
Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
  "type": "record",
  "fields": [
    { "name": "field1", "type": "int" },
    { "name": "field2", "type": "string"}
]}
 = Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe
Reply | Threaded
Open this post in threaded view
|

Re: AvroSchemaConverter and Tuple<T> classes

Timo Walther
Hi,

tuples are just a sub category of rows. Because the tuple arity is
limited to 25 fields. I think the easiest solution would be to write
your own converter that maps rows to tuples if you know that you will
not need more than 25 fields. Otherwise it might be easier to just use a
TextInputFormat and do the parsing yourself with a library.

Regards,
Timo


Am 23.08.18 um 18:54 schrieb françois lacombe:

> Hi all,
>
> I'm looking for best practices regarding Tuple<T> instances creation.
>
> I have a TypeInformation object produced by
> AvroSchemaConverter.convertToTypeInfo("{...}");
> Is this possible to define a corresponding Tuple<T> instance with it?
> (get the T from the TypeInformation)
>
> Example :
> {
>   "type": "record",
>   "fields": [
>     { "name": "field1", "type": "int" },
>     { "name": "field2", "type": "string"}
> ]}
>  = Tuple2<Int,String>
>
> The same question rises with DataSet or other any record handling
> class with parametrized types.
>
> My goal is to parse several CsvFiles with different structures
> described in an Avro schema.
> It would be great to not hard-code structures in my Java code and only
> get types information at runtime from Avro schemas
>
> Is this possible?
>
> Thanks in advance
>
> François Lacombe


Reply | Threaded
Open this post in threaded view
|

Re: AvroSchemaConverter and Tuple<T> classes

françois lacombe
Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:
Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo


Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
  "type": "record",
  "fields": [
    { "name": "field1", "type": "int" },
    { "name": "field2", "type": "string"}
]}
 = Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe



Reply | Threaded
Open this post in threaded view
|

Re: AvroSchemaConverter and Tuple<T> classes

Rong Rong
Yes you should be able to use Row instead of Tuple in your BatchTableSink<T>.
There's sections in Flink documentation regarding mapping of data types to table schemas [1]. and table can be converted into various typed DataStream [2] as well. Hope these are helpful.

Thanks,
Rong



On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <[hidden email]> wrote:
Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:
Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo


Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
  "type": "record",
  "fields": [
    { "name": "field1", "type": "int" },
    { "name": "field2", "type": "string"}
]}
 = Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe



Reply | Threaded
Open this post in threaded view
|

Re: AvroSchemaConverter and Tuple<T> classes

françois lacombe
Thank you all for you answers.

It's ok with BatchTableSource<Row>


All the best

François

2018-08-26 17:40 GMT+02:00 Rong Rong <[hidden email]>:
Yes you should be able to use Row instead of Tuple in your BatchTableSink<T>.
There's sections in Flink documentation regarding mapping of data types to table schemas [1]. and table can be converted into various typed DataStream [2] as well. Hope these are helpful.

Thanks,
Rong



On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <[hidden email]> wrote:
Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <[hidden email]>:
Hi,

tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a TextInputFormat and do the parsing yourself with a library.

Regards,
Timo


Am 23.08.18 um 18:54 schrieb françois lacombe:

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get the T from the TypeInformation)

Example :
{
  "type": "record",
  "fields": [
    { "name": "field1", "type": "int" },
    { "name": "field2", "type": "string"}
]}
 = Tuple2<Int,String>

The same question rises with DataSet or other any record handling class with parametrized types.

My goal is to parse several CsvFiles with different structures described in an Avro schema.
It would be great to not hard-code structures in my Java code and only get types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe