Hi all,
I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog? Thanks, wangsan |
Hi Wangsan,
yes, the Hive integration is limited so far. However, we provide an external catalog feature [0] that allows you to implement custom logic to retrieve Hive tables. I think it is not possible to do all you operations in Flink's SQL API right now. For now, I think you need to combine DataStream and SQL. E.g. the Hive lookups should happen in an asychronous fashion to reduce latency [1]. As far as I know, JDBC does not allow to retrieve records in a streaming fashion easily. That's why there is only a TableSink but no Source. Stream joining is limited so far. We will support window joins in the upcoming release and likely provide a full history joins in 1.5. The Table & SQL API is still a young API but the development happens quickly. If you are interested in contributing, feel free to wring on the dev@ mailing list. Regards, Timo [0] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/common.html#register-an-external-catalog [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html Am 11/20/17 um 1:27 PM schrieb wangsan: > Hi all, > > I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog? > > Thanks, > wangsan |
Hi Timo,
Thanks for your reply. I do notice that the document says "A Table is always bound to a specific TableEnvironment . It is not possible to combine tables of different TableEnvironments in the same query, e.g., to join or union them.” Does that mean there is no way I can make operations, like join, on a streaming table and a batch table ?Best, wangsan
|
Hi,
no, combining batch and streaming environments is not possible at the moment. However, most operations in batch can be done in streaming fashion as well. I would recommend to use the DataStream API as it provides the most flexibility in your use case. Regards, Timo Am 11/21/17 um 4:41 AM schrieb wangsan: Hi Timo,
|
Free forum by Nabble | Edit this page |