Hive integration in table API and SQL

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Hive integration in table API and SQL

wangsan
Hi all,

I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?

Thanks,
wangsan
Reply | Threaded
Open this post in threaded view
|

Re: Hive integration in table API and SQL

Timo Walther
Hi Wangsan,

yes, the Hive integration is limited so far. However, we provide an
external catalog feature [0] that allows you to implement custom logic
to retrieve Hive tables. I think it is not possible to do all you
operations in Flink's SQL API right now. For now, I think you need to
combine DataStream and SQL. E.g. the Hive lookups should happen in an
asychronous fashion to reduce latency [1]. As far as I know, JDBC does
not allow to retrieve records in a streaming fashion easily. That's why
there is only a TableSink but no Source. Stream joining is limited so
far. We will support window joins in the upcoming release and likely
provide a full history joins in 1.5. The Table & SQL API is still a
young API but the development happens quickly. If you are interested in
contributing, feel free to wring on the dev@ mailing list.

Regards,
Timo

[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/common.html#register-an-external-catalog
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html


Am 11/20/17 um 1:27 PM schrieb wangsan:
> Hi all,
>
> I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?
>
> Thanks,
> wangsan


Reply | Threaded
Open this post in threaded view
|

Re: Hive integration in table API and SQL

wangsan
Hi Timo,

Thanks for your reply. I do notice that the document says "Table is always bound to a specific TableEnvironment. It is not possible to combine tables of different TableEnvironments in the same query, e.g., to join or union them.” Does that mean there is no way I can make operations, like join, on a streaming table and a batch table ?

Best,
wangsan

On 20 Nov 2017, at 9:15 PM, Timo Walther <[hidden email]> wrote:

Timo

Reply | Threaded
Open this post in threaded view
|

Re: Hive integration in table API and SQL

Timo Walther
Hi,

no, combining batch and streaming environments is not possible at the moment. However, most operations in batch can be done in streaming fashion as well. I would recommend to use the DataStream API as it provides the most flexibility in your use case.

Regards,
Timo


Am 11/21/17 um 4:41 AM schrieb wangsan:
Hi Timo,

Thanks for your reply. I do notice that the document says "Table is always bound to a specific TableEnvironment. It is not possible to combine tables of different TableEnvironments in the same query, e.g., to join or union them.” Does that mean there is no way I can make operations, like join, on a streaming table and a batch table ?

Best,
wangsan

On 20 Nov 2017, at 9:15 PM, Timo Walther <[hidden email]> wrote:

Timo