Hi All,
Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example) anyway any ideas/pointers will be highly appreciated thanks in advance |
Hi, as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer. This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at SequenceFileWriter, which should be similar. Cheers, Aljoscha On Thu, 21 Apr 2016 at 09:45 Igor Berman <[hidden email]> wrote:
|
ok, I have working prototype already, if somebody is interested(attached) I might add it as PR latter(with tests etc) tested locally & with s3 On 21 April 2016 at 12:01, Aljoscha Krettek <[hidden email]> wrote:
AvroSinkWriter.java (11K) Download Attachment |
Hi, the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637 It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure. Cheers, Aljoscha On Thu, 21 Apr 2016 at 18:40 Igor Berman <[hidden email]> wrote:
|
Hi, it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet). Wanted to ask if it's acceptable to add following dependency to flink? I mean my code reused code in this jar(pay attention it's not present currently in flink classpath) <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <version>1.7.6</version> <classifier>hadoop2</classifier> </dependency> On 25 April 2016 at 16:20, Aljoscha Krettek <[hidden email]> wrote:
|
Hi, which code did you reuse from there? I asked Robert and I think it is somewhat problematic to add these somewhat bigger dependencies. Cheers, Aljoscha On Mon, 25 Apr 2016 at 21:24 Igor Berman <[hidden email]> wrote:
|
Hi Aljoscha, it's primary used in M/R jobs that produce avro output see some details here : https://avro.apache.org/docs/1.7.6/mr.html I have extracted(kind of copy-pasted+adjustments) some of the classes from there to remove this dependency and it's seems to work with basic scenario I still want to write it as if it was created with M/R job to be compatible with this library(e.g. key-value pairs are wrapped into AvroKeyValue object) so that it's not important if Flink or regular M/R created this files, still their consumer can read them in a same way WDYT? On 27 April 2016 at 11:27, Aljoscha Krettek <[hidden email]> wrote:
|
[FLINK-3854] Support Avro key-value rolling sink writer #1953On 27 April 2016 at 19:29, Igor Berman <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |