Avro Parquet/Flink/Beam

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Avro Parquet/Flink/Beam

Newport, Billy

Are there any examples showing the use of beam with avro/parquet and a flink runner? I see an avro reader for beam, is it a matter of writing another one for avro-parquet or does this need to use the flink HadoopOutputFormat for example?

 

Thanks

Billy

 

Reply | Threaded
Open this post in threaded view
|

Re: Avro Parquet/Flink/Beam

Jean-Baptiste Onofré
Hi,

Beam provides a AvroCoder/AvroIO that you can use, but not yet a
ParquetIO (I created a Jira about that and started to work on it).

You can use the Avro reader to populate the PCollection and then use a
custom DoFn to create the Parquet (waiting for the ParquetIO).

Regards
JB

On 12/12/2016 05:19 PM, Newport, Billy wrote:

> Are there any examples showing the use of beam with avro/parquet and a
> flink runner? I see an avro reader for beam, is it a matter of writing
> another one for avro-parquet or does this need to use the flink
> HadoopOutputFormat for example?
>
>
>
> Thanks
>
> Billy
>
>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com
Reply | Threaded
Open this post in threaded view
|

RE: Avro Parquet/Flink/Beam

Newport, Billy
I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk?

The ParquetIO is independent of the runner being used? Is that right?

Thanks

-----Original Message-----
From: Jean-Baptiste Onofré [mailto:[hidden email]]
Sent: Monday, December 12, 2016 11:25 AM
To: [hidden email]
Subject: Re: Avro Parquet/Flink/Beam

Hi,

Beam provides a AvroCoder/AvroIO that you can use, but not yet a
ParquetIO (I created a Jira about that and started to work on it).

You can use the Avro reader to populate the PCollection and then use a
custom DoFn to create the Parquet (waiting for the ParquetIO).

Regards
JB

On 12/12/2016 05:19 PM, Newport, Billy wrote:

> Are there any examples showing the use of beam with avro/parquet and a
> flink runner? I see an avro reader for beam, is it a matter of writing
> another one for avro-parquet or does this need to use the flink
> HadoopOutputFormat for example?
>
>
>
> Thanks
>
> Billy
>
>
>

--
Jean-Baptiste Onofré
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=wsZfFaIgCU4OQCJzjCyCLIVFFKeRBjbv4lB3kSqYRjw&s=AnmdxwKDl7BYeuvQ001GrywGxW0Kvnwtgs3ikrNou8Y&e= 
Talend - https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=wsZfFaIgCU4OQCJzjCyCLIVFFKeRBjbv4lB3kSqYRjw&s=5T8pN5Tz5hIpwH9uf77csajX0wJLjHzJ3kyqSzxQ2Xw&e= 
Reply | Threaded
Open this post in threaded view
|

Re: Avro Parquet/Flink/Beam

Jean-Baptiste Onofré
Hi Billy,

I will push my branch with ParquetIO on my github.

Yes, the Beam IO is independent from the runner.

Regards
JB

On 12/12/2016 05:29 PM, Newport, Billy wrote:

> I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk?
>
> The ParquetIO is independent of the runner being used? Is that right?
>
> Thanks
>
> -----Original Message-----
> From: Jean-Baptiste Onofré [mailto:[hidden email]]
> Sent: Monday, December 12, 2016 11:25 AM
> To: [hidden email]
> Subject: Re: Avro Parquet/Flink/Beam
>
> Hi,
>
> Beam provides a AvroCoder/AvroIO that you can use, but not yet a
> ParquetIO (I created a Jira about that and started to work on it).
>
> You can use the Avro reader to populate the PCollection and then use a
> custom DoFn to create the Parquet (waiting for the ParquetIO).
>
> Regards
> JB
>
> On 12/12/2016 05:19 PM, Newport, Billy wrote:
>> Are there any examples showing the use of beam with avro/parquet and a
>> flink runner? I see an avro reader for beam, is it a matter of writing
>> another one for avro-parquet or does this need to use the flink
>> HadoopOutputFormat for example?
>>
>>
>>
>> Thanks
>>
>> Billy
>>
>>
>>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com
Reply | Threaded
Open this post in threaded view
|

RE: Avro Parquet/Flink/Beam

Newport, Billy
Is your parquetio going to be accepted in to 0.4?

Also, do you have a link to your github?


Thanks

-----Original Message-----
From: Jean-Baptiste Onofré [mailto:[hidden email]]
Sent: Monday, December 12, 2016 11:49 AM
To: [hidden email]
Subject: Re: Avro Parquet/Flink/Beam

Hi Billy,

I will push my branch with ParquetIO on my github.

Yes, the Beam IO is independent from the runner.

Regards
JB

On 12/12/2016 05:29 PM, Newport, Billy wrote:

> I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk?
>
> The ParquetIO is independent of the runner being used? Is that right?
>
> Thanks
>
> -----Original Message-----
> From: Jean-Baptiste Onofré [mailto:[hidden email]]
> Sent: Monday, December 12, 2016 11:25 AM
> To: [hidden email]
> Subject: Re: Avro Parquet/Flink/Beam
>
> Hi,
>
> Beam provides a AvroCoder/AvroIO that you can use, but not yet a
> ParquetIO (I created a Jira about that and started to work on it).
>
> You can use the Avro reader to populate the PCollection and then use a
> custom DoFn to create the Parquet (waiting for the ParquetIO).
>
> Regards
> JB
>
> On 12/12/2016 05:19 PM, Newport, Billy wrote:
>> Are there any examples showing the use of beam with avro/parquet and a
>> flink runner? I see an avro reader for beam, is it a matter of writing
>> another one for avro-parquet or does this need to use the flink
>> HadoopOutputFormat for example?
>>
>>
>>
>> Thanks
>>
>> Billy
>>
>>
>>
>

--
Jean-Baptiste Onofré
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=EwGuUUxM48zoWoOis4Qf-DWNAER-A45_WBY7OJouJWQ&s=7-6dzKAcQozOmfL30C0Y44i2mkkAf_Vi5CxKjgWgM5Y&e= 
Talend - https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=EwGuUUxM48zoWoOis4Qf-DWNAER-A45_WBY7OJouJWQ&s=B9Rvx9ad1wvy-Uc01v9S47e48k1uBZooIucUVuiZr2M&e= 
Reply | Threaded
Open this post in threaded view
|

Re: Avro Parquet/Flink/Beam

Jean-Baptiste Onofré
Hi Billy,

no, ParquetIO is in early stage and won't be included in
0.4.0-incubating (that I will prepare pretty soon).

I will push the branch on my github (didn't have time yet, sorry about
that).

Regards
JB

On 12/13/2016 05:08 PM, Newport, Billy wrote:

> Is your parquetio going to be accepted in to 0.4?
>
> Also, do you have a link to your github?
>
>
> Thanks
>
> -----Original Message-----
> From: Jean-Baptiste Onofré [mailto:[hidden email]]
> Sent: Monday, December 12, 2016 11:49 AM
> To: [hidden email]
> Subject: Re: Avro Parquet/Flink/Beam
>
> Hi Billy,
>
> I will push my branch with ParquetIO on my github.
>
> Yes, the Beam IO is independent from the runner.
>
> Regards
> JB
>
> On 12/12/2016 05:29 PM, Newport, Billy wrote:
>> I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk?
>>
>> The ParquetIO is independent of the runner being used? Is that right?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Jean-Baptiste Onofré [mailto:[hidden email]]
>> Sent: Monday, December 12, 2016 11:25 AM
>> To: [hidden email]
>> Subject: Re: Avro Parquet/Flink/Beam
>>
>> Hi,
>>
>> Beam provides a AvroCoder/AvroIO that you can use, but not yet a
>> ParquetIO (I created a Jira about that and started to work on it).
>>
>> You can use the Avro reader to populate the PCollection and then use a
>> custom DoFn to create the Parquet (waiting for the ParquetIO).
>>
>> Regards
>> JB
>>
>> On 12/12/2016 05:19 PM, Newport, Billy wrote:
>>> Are there any examples showing the use of beam with avro/parquet and a
>>> flink runner? I see an avro reader for beam, is it a matter of writing
>>> another one for avro-parquet or does this need to use the flink
>>> HadoopOutputFormat for example?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Billy
>>>
>>>
>>>
>>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com
Reply | Threaded
Open this post in threaded view
|

RE: Avro Parquet/Flink/Beam

Newport, Billy
Did you manage to push yet?

Thanks

-----Original Message-----
From: Jean-Baptiste Onofré [mailto:[hidden email]]
Sent: Tuesday, December 13, 2016 11:12 AM
To: [hidden email]
Subject: Re: Avro Parquet/Flink/Beam

Hi Billy,

no, ParquetIO is in early stage and won't be included in
0.4.0-incubating (that I will prepare pretty soon).

I will push the branch on my github (didn't have time yet, sorry about
that).

Regards
JB

On 12/13/2016 05:08 PM, Newport, Billy wrote:

> Is your parquetio going to be accepted in to 0.4?
>
> Also, do you have a link to your github?
>
>
> Thanks
>
> -----Original Message-----
> From: Jean-Baptiste Onofré [mailto:[hidden email]]
> Sent: Monday, December 12, 2016 11:49 AM
> To: [hidden email]
> Subject: Re: Avro Parquet/Flink/Beam
>
> Hi Billy,
>
> I will push my branch with ParquetIO on my github.
>
> Yes, the Beam IO is independent from the runner.
>
> Regards
> JB
>
> On 12/12/2016 05:29 PM, Newport, Billy wrote:
>> I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk?
>>
>> The ParquetIO is independent of the runner being used? Is that right?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Jean-Baptiste Onofré [mailto:[hidden email]]
>> Sent: Monday, December 12, 2016 11:25 AM
>> To: [hidden email]
>> Subject: Re: Avro Parquet/Flink/Beam
>>
>> Hi,
>>
>> Beam provides a AvroCoder/AvroIO that you can use, but not yet a
>> ParquetIO (I created a Jira about that and started to work on it).
>>
>> You can use the Avro reader to populate the PCollection and then use a
>> custom DoFn to create the Parquet (waiting for the ParquetIO).
>>
>> Regards
>> JB
>>
>> On 12/12/2016 05:19 PM, Newport, Billy wrote:
>>> Are there any examples showing the use of beam with avro/parquet and a
>>> flink runner? I see an avro reader for beam, is it a matter of writing
>>> another one for avro-parquet or does this need to use the flink
>>> HadoopOutputFormat for example?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Billy
>>>
>>>
>>>
>>
>

--
Jean-Baptiste Onofré
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=foW01bjB8Oy4ICqJ1GJc9WFEdV5nC7P6yv_tOZMICIA&s=OYTxPXi8et-CQqmqM0Q2Pa-JltDAlVas6CwMfEPlGhA&e= 
Talend - https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=DgID-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=foW01bjB8Oy4ICqJ1GJc9WFEdV5nC7P6yv_tOZMICIA&s=XPIN-RVxb72Xi67lD_FvmvDZXyX8zN_c98au7cUzvWQ&e=