Re: Streaming kafka data sink to hive
Posted by
Jingsong Li on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Streaming-kafka-data-sink-to-hive-tp33806p33809.html
Hi wanglei,
> 1 Is there any flink-hive-connector that i can use to write to hive streamingly?
"Streaming kafka data sink to hive" is under discussion.[1]
And POC work is ongoing.[2] We want to support it in release-1.11.
> 2 Since HDFS is not friendly to frequently append and hive's data is stored to hdfs, is it OK if the throughput is high?
We should concern small files, It's better to have 128MB for each file.
If the throughput is high, I think you can try to write files in 5 minutes or 10 minutes.
You can learn more in [3].
Best,
Jingsong Lee
We have many app logs on our app server and want to parse the logs to structed table format and then sink to hive.
Seems it is good to use batch mode. The app log is hourly compressed and it is convenience to do partitioning.
We want to use streaming mode. Tail the app logs to Kafka, then use flink to read kafka topic and then sink to Hive.
I have several questions.
1 Is there any flink-hive-connector that i can use to write to hive streamingly?
2 Since HDFS is not friendly to frequently append and hive's data is stored to hdfs, is it OK if the throughput is high?
Thanks,
Lei
--