Hello everyone: I am a user and fan of flink. I also want to join the flink community. I contributed my first PR a few days ago. Can anyone help me to review my code? If there is something wrong, hope I would be grateful if you can give some advice. This PR is mainly in the process of development, I use sql to read data from kafka and then write to hdfs, I found that there is no suitable tablesink, I found the document and found that File System Connector is only experimental (https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector), so I wrote a Bucket File System Table Sink that supports writing stream data. Hdfs, file file system, data format supports json, csv, parquet, avro. Subsequently add other format support, such as protobuf, thrift, etc. In addition, I also added documentation, python api, units test, end-end-test, sql-client, DDL, and compiled on travis. the issue is https://issues.apache.org/jira/browse/FLINK-12584 thank you very much
|
Hi Jun, Thanks for bringing this up, in general I'm +1 on this feature. As you might know, there is another ongoing efforts about such kind of table sink, which covered in newly proposed partition support reworking[1]. In this proposal, we also want to introduce a new file system connector, which can not only cover the partition support, but also end-to-end exactly once in streaming mode. I would suggest we could combine these two efforts into one. The benefits would be save some review efforts, also reduce the core connector number to ease our maintaining effort in the future. What do you think? BTW, BucketingSink is already deprecated, I think we should refer to StreamingFileSink instead. Best, Kurt On Tue, Sep 17, 2019 at 10:39 AM Jun Zhang <[hidden email]> wrote:
|
Thanks. Let me clarify a bit more about my thinkings. Generally, I would prefer we can concentrate the functionalities about connector, especially some standard & most popular connectors, like kafka, different file system with different formats, etc. We should make these core connectors as powerful as we can, and can also prevent something badly from happening, such as "if you want use this feature, please use connectorA. But if you want use another feature, please use connectorB". Best, Kurt On Tue, Sep 17, 2019 at 11:11 AM Jun Zhang <[hidden email]> wrote:
|
Hi Kurt: Thanks. When I encountered this problem, I found a File System Connector, but its function is not powerful enough and rich. I also found that it is built into Flink, there are many unit tests that refer to it, so I dare not easily modify it to enrich its functions. So I develop a new Connector, and later we can keep only one File System Connector and ensure that it is powerful and stable. I will learn about FLIP-63 and see if there is a better solution to combine these two functions. I am very willing to join this development. ------------------ 原始邮件 ------------------ 发件人: "Kurt Young"<[hidden email]>; 发送时间: 2019年9月17日(星期二) 中午11:19 收件人: "Jun Zhang"<[hidden email]>; 主题: Re: Add Bucket File System Table Sink prefer we can concentrate the functionalities about connector, especially some standard & most popular connectors, like kafka, different file system with different formats, etc. We should make these core connectors as powerful as we can, and can also prevent something badly from happening, such as "if you want use this feature, please use connectorA. But if you want use another feature, please use connectorB". Best, Kurt On Tue, Sep 17, 2019 at 11:11 AM Jun Zhang <[hidden email]> wrote: > Hi Kurt: > thank you very much. > I will take a closer look at the FLIP-63. > > I develop this PR, the underlying is StreamingFileSink, not > BuckingSink, but I gave him a name, called Bucket. > > > On 09/17/2019 10:57,Kurt Young<[hidden email]> <[hidden email]> > wrote: > > Hi Jun, > > Thanks for bringing this up, in general I'm +1 on this feature. As > you might know, there is another ongoing efforts about such kind > of table sink, which covered in newly proposed partition support > reworking[1]. In this proposal, we also want to introduce a new > file system connector, which can not only cover the partition > support, but also end-to-end exactly once in streaming mode. > > I would suggest we could combine these two efforts into one. The > benefits would be save some review efforts, also reduce the core > connector number to ease our maintaining effort in the future. > What do you think? > > BTW, BucketingSink is already deprecated, I think we should refer > to StreamingFileSink instead. > > Best, > Kurt > > [1] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html > > > On Tue, Sep 17, 2019 at 10:39 AM Jun Zhang <[hidden email]> wrote: > >> Hello everyone: >> I am a user and fan of flink. I also want to join the flink community. I >> contributed my first PR a few days ago. Can anyone help me to review my >> code? If there is something wrong, hope I would be grateful if you can give >> some advice. >> >> This PR is mainly in the process of development, I use sql to read data >> from kafka and then write to hdfs, I found that there is no suitable >> tablesink, I found the document and found that File System Connector is >> only experimental ( >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector), >> so I wrote a Bucket File System Table Sink that supports writing stream >> data. Hdfs, file file system, data format supports json, csv, parquet, >> avro. Subsequently add other format support, such as protobuf, thrift, etc. >> >> In addition, I also added documentation, python api, units test, >> end-end-test, sql-client, DDL, and compiled on travis. >> >> the issue is https://issues.apache.org/jira/browse/FLINK-12584 >> thank you very much >> >> >> |
Great to hear. Best, Kurt On Tue, Sep 17, 2019 at 11:45 AM Jun Zhang <[hidden email]> wrote:
|
Hi Jun, Thank you very much for your contribution. I think a Bucketing File System Table Sink would be a great addition. Our code contribution guidelines [1] recommend to discuss the design with the community before opening a PR. First of all, this ensures that the design is aligned with Flink's codebase and the future features. Moreover, it helps to find a committer who can help to shepherd the PR. Something that is always a good idea is to split a contribution in multiple smaller PRs (if possible). This allows for faster review and progress. Best, Fabian Am Di., 17. Sept. 2019 um 04:39 Uhr schrieb Jun Zhang <[hidden email]>:
|
Hi,Fabian : Thank you very much for your suggestion. This is when I use flink sql to write data to hdfs at work. I feel that it is inconvenient. I wrote this function, and then I want to contribute it to the community. This is my first PR , some processes may not be clear, I am very sorry. Kurt suggested combining this feature with FLIP-63 because they have some common features, such as write data to file system with kinds of format, so I want to treat this function as a sub-task of FLIP-63. Add a partitionable bucket file system table sink. I then added the document and sent a DISCUSS to explain my detailed design ideas and implementation. How do you see it? ------------------ Original ------------------ From: Fabian Hueske <[hidden email]> Date: Fri,Sep 20,2019 9:38 PM To: Jun Zhang <[hidden email]> Cc: dev <[hidden email]>, user <[hidden email]> Subject: Re: Add Bucket File System Table Sink Hi Jun, Thank you very much for your contribution. I think a Bucketing File System Table Sink would be a great addition. Our code contribution guidelines [1] recommend to discuss the design with the community before opening a PR. First of all, this ensures that the design is aligned with Flink's codebase and the future features. Moreover, it helps to find a committer who can help to shepherd the PR. Something that is always a good idea is to split a contribution in multiple smaller PRs (if possible). This allows for faster review and progress. Best, Fabian [1] https://flink.apache.org/contributing/contribute-code.html Am Di., 17. Sept. 2019 um 04:39 Uhr schrieb Jun Zhang <[hidden email]>: > Hello everyone: > I am a user and fan of flink. I also want to join the flink community. I > contributed my first PR a few days ago. Can anyone help me to review my > code? If there is something wrong, hope I would be grateful if you can give > some advice. > > This PR is mainly in the process of development, I use sql to read data > from kafka and then write to hdfs, I found that there is no suitable > tablesink, I found the document and found that File System Connector is > only experimental ( > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector), > so I wrote a Bucket File System Table Sink that supports writing stream > data. Hdfs, file file system, data format supports json, csv, parquet, > avro. Subsequently add other format support, such as protobuf, thrift, etc. > > In addition, I also added documentation, python api, units test, > end-end-test, sql-client, DDL, and compiled on travis. > > the issue is https://issues.apache.org/jira/browse/FLINK-12584 > thank you very much > > > |
Free forum by Nabble | Edit this page |