Support COUNT(DISTINCT 'field') Query Yet?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Support COUNT(DISTINCT 'field') Query Yet?

xiatao123
SELECT TUMBLE_START(event_timestamp, INTERVAL '1' HOUR), COUNT(DISTINCT
session), COUNT(DISTINCT user_id), SUM(duration), SUM(num_interactions) FROM
unified_events GROUP BY TUMBLE(event_timestamp, INTERVAL '1' HOUR)

I have the above statement my flink query running on Flink 1.3.2, but got
the error message Caused by: org.apache.flink.table.api.TableException:
Cannot generate a valid execution plan for the given query

Is the feature supported yet? if Yes, in which version of flink? If no, any
timeline to support it?

Thanks,
Tao



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support COUNT(DISTINCT 'field') Query Yet?

Fabian Hueske-2
Hi Tao,

DISTINCT aggregates in group windows are not supported yet.
There's currently a discussion on the dev mailing list about this feature [1].

Since we are only a few days before the feature freeze of Flink 1.5.0, it might be included in 1.6.0, about 4-5 months from now.

As a workaround, you can implement a custom user-defined aggregation function [2] that performs distinct counts.

Cheers,
Fabian

2018-02-15 0:15 GMT+01:00 xiatao123 <[hidden email]>:
SELECT TUMBLE_START(event_timestamp, INTERVAL '1' HOUR), COUNT(DISTINCT
session), COUNT(DISTINCT user_id), SUM(duration), SUM(num_interactions) FROM
unified_events GROUP BY TUMBLE(event_timestamp, INTERVAL '1' HOUR)

I have the above statement my flink query running on Flink 1.3.2, but got
the error message Caused by: org.apache.flink.table.api.TableException:
Cannot generate a valid execution plan for the given query

Is the feature supported yet? if Yes, in which version of flink? If no, any
timeline to support it?

Thanks,
Tao



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/