Streaming write to Hive

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming write to Hive

qi luo
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi
Reply | Threaded
Open this post in threaded view
|

Re: Streaming write to Hive

phoenixjiangnan
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi
Reply | Threaded
Open this post in threaded view
|

Re: Streaming write to Hive

qi luo
Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi

On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <[hidden email]> wrote:
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi
Reply | Threaded
Open this post in threaded view
|

Re: Streaming write to Hive

phoenixjiangnan
Hi, 

I'm not sure if there's one yet. Feel free to create one if not.

On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <[hidden email]> wrote:
Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi

On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <[hidden email]> wrote:
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi
Reply | Threaded
Open this post in threaded view
|

Re: Streaming write to Hive

JingsongLee
Hi luoqi:

With partition support[1], I want to introduce a FileFormatSink to
cover streaming exactly-once and partition-related logic for flink
file connectors and hive connector. You can take a look.


Best,
Jingsong Lee

------------------------------------------------------------------
From:Bowen Li <[hidden email]>
Send Time:2019年9月6日(星期五) 05:21
To:Qi Luo <[hidden email]>
Cc:user <[hidden email]>; snake.fly318 <[hidden email]>; lichang.bd <[hidden email]>
Subject:Re: Streaming write to Hive

Hi, 

I'm not sure if there's one yet. Feel free to create one if not.

On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <[hidden email]> wrote:
Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi

On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <[hidden email]> wrote:
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi

Reply | Threaded
Open this post in threaded view
|

Re: Streaming write to Hive

qi luo
Hi JingsongLee,

Fantastic! We'll look into it.

Thanks,
Qi

On Fri, Sep 6, 2019 at 10:52 AM JingsongLee <[hidden email]> wrote:
Hi luoqi:

With partition support[1], I want to introduce a FileFormatSink to
cover streaming exactly-once and partition-related logic for flink
file connectors and hive connector. You can take a look.


Best,
Jingsong Lee

------------------------------------------------------------------
From:Bowen Li <[hidden email]>
Send Time:2019年9月6日(星期五) 05:21
To:Qi Luo <[hidden email]>
Subject:Re: Streaming write to Hive

Hi, 

I'm not sure if there's one yet. Feel free to create one if not.

On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <[hidden email]> wrote:
Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi

On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <[hidden email]> wrote:
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <[hidden email]> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi