Hello,
I try to configure SQL Client to query partitioned ORC data on local filesystem. I have directory structure like that: /tmp/table1/startdate=2020-11-28 /tmp/table1/startdate=2020-11-27 etc. If I run SQL Client session and create table by hand: create table tst (column1 string, startdate string) partitioned by (startdate) with ('connector'='filesystem', 'format'='orc', 'path'='/tmp/table1'); everything runs fine: explain select * from tst where startdate='2020-11-27' shows that only one partition in 'readPartitions' However, I struggle to configure table in .yaml config. I tried like this (after some struggle, as "partition.keys" setting doesn't seem to be documented...) : tables: - name: tst2 type: source-table connector: filesystem path: "/tmp/table1" format: orc partition.keys: - name: startdate schema: - name: column1 data-type: string - name: startdate data-type: string and it more or less works - queries are executed properly. However, partitions are not pruned: explain select * from tst2 where startdate='2020-11-27' show all partitions in 'readPartitions' Any idea what can be wrong? I'm using Flink 1.11.2 thanks, maciek |
Hi Maciek, I am pulling in Timo who might help you with this problem. Cheers, Till On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak <[hidden email]> wrote: Hello, |
Only legacy connectors (`connector.type=kafka` instead of `connector=kafka`) are supported in the YAML at the moment. You can use regular DDL instead. There is a similar discussion in https://issues.apache.org/jira/browse/FLINK-20260 these days. Best, Jark On Thu, 3 Dec 2020 at 00:52, Till Rohrmann <[hidden email]> wrote:
|
Hi Jark, thanks for answer. I'm a bit puzzled, because in my yaml I'm using "connector: filesystem" (not connector.type). I don't think I end up using https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#file-system-connector - this connector as partitioning and orc format are handled correctly.
It's also not clear for me what is "not legacy" connector for
reading files directly from filesystem (no Hive). I don't see any
implementation of DynamicTableSourceFactory which would do this. I assumed that using DDL I wrote below also gives me
FileSystemTableFactory, am I wrong?
thanks, maciek
On 03.12.2020 16:26, Jark Wu wrote:
|
Free forum by Nabble | Edit this page |