AvroWriter for Rolling sink

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

AvroWriter for Rolling sink

igor.berman
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance

Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

Aljoscha Krettek
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance

Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

igor.berman
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance



AvroSinkWriter.java (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

Aljoscha Krettek
Hi,
the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637

It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance


Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

igor.berman
Hi,
it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet).
Wanted to ask if it's acceptable to add following dependency to flink?
I mean my code reused code in this jar(pay attention it's not present currently in flink classpath)
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>

On 25 April 2016 at 16:20, Aljoscha Krettek <[hidden email]> wrote:
Hi,
the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637

It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance



Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

Aljoscha Krettek
Hi,
which code did you reuse from there? I asked Robert and I think it is somewhat problematic to add these somewhat bigger dependencies.

Cheers,
Aljoscha

On Mon, 25 Apr 2016 at 21:24 Igor Berman <[hidden email]> wrote:
Hi,
it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet).
Wanted to ask if it's acceptable to add following dependency to flink?
I mean my code reused code in this jar(pay attention it's not present currently in flink classpath)
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>

On 25 April 2016 at 16:20, Aljoscha Krettek <[hidden email]> wrote:
Hi,
the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637

It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance



Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

igor.berman
Hi Aljoscha,

avro-mapred jar contains different M/R output formats for avro, and their writers
it's primary used in M/R jobs that produce avro output

I have extracted(kind of copy-pasted+adjustments) some of the classes from there to remove this dependency and it's seems to work with basic scenario
I still want to write it as if it was created with M/R job to be compatible with this library(e.g. key-value pairs are wrapped into AvroKeyValue object) so that it's not important if Flink or regular M/R created this files, still their consumer can read them in a same way
WDYT?





On 27 April 2016 at 11:27, Aljoscha Krettek <[hidden email]> wrote:
Hi,
which code did you reuse from there? I asked Robert and I think it is somewhat problematic to add these somewhat bigger dependencies.

Cheers,
Aljoscha

On Mon, 25 Apr 2016 at 21:24 Igor Berman <[hidden email]> wrote:
Hi,
it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet).
Wanted to ask if it's acceptable to add following dependency to flink?
I mean my code reused code in this jar(pay attention it's not present currently in flink classpath)
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>

On 25 April 2016 at 16:20, Aljoscha Krettek <[hidden email]> wrote:
Hi,
the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637

It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance




Reply | Threaded
Open this post in threaded view
|

Re: AvroWriter for Rolling sink

igor.berman

[FLINK-3854] Support Avro key-value rolling sink writer #1953


On 27 April 2016 at 19:29, Igor Berman <[hidden email]> wrote:
Hi Aljoscha,

avro-mapred jar contains different M/R output formats for avro, and their writers
it's primary used in M/R jobs that produce avro output

I have extracted(kind of copy-pasted+adjustments) some of the classes from there to remove this dependency and it's seems to work with basic scenario
I still want to write it as if it was created with M/R job to be compatible with this library(e.g. key-value pairs are wrapped into AvroKeyValue object) so that it's not important if Flink or regular M/R created this files, still their consumer can read them in a same way
WDYT?





On 27 April 2016 at 11:27, Aljoscha Krettek <[hidden email]> wrote:
Hi,
which code did you reuse from there? I asked Robert and I think it is somewhat problematic to add these somewhat bigger dependencies.

Cheers,
Aljoscha

On Mon, 25 Apr 2016 at 21:24 Igor Berman <[hidden email]> wrote:
Hi,
it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet).
Wanted to ask if it's acceptable to add following dependency to flink?
I mean my code reused code in this jar(pay attention it's not present currently in flink classpath)
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>

On 25 April 2016 at 16:20, Aljoscha Krettek <[hidden email]> wrote:
Hi,
the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637

It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
ok,
I have working prototype already, if somebody is interested(attached)

I might add it as PR latter(with tests etc)

tested locally & with s3







On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
Hi,
as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer.

This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example)

anyway any ideas/pointers will be highly appreciated

thanks in advance