Hi guys,
In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition). Is there any easy way we can streaming write to Hive (with exactly-once guarantee)? Thanks, Qi |
Hi Qi, With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make. It's on our list for 1.10, not high priority though. Bowen On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
|
Hi Bowen, Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing? +Yufei and Chang who are also interested in this. Thanks, Qi On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <[hidden email]> wrote:
|
Hi, I'm not sure if there's one yet. Feel free to create one if not. On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <[hidden email]> wrote:
|
Hi luoqi: With partition support[1], I want to introduce a FileFormatSink to cover streaming exactly-once and partition-related logic for flink file connectors and hive connector. You can take a look. [1] https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing Best, Jingsong Lee
|
Hi JingsongLee, Fantastic! We'll look into it. Thanks, Qi On Fri, Sep 6, 2019 at 10:52 AM JingsongLee <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |