(DEPRECATED) Apache Flink User Mailing List archive.

Flink Lookup Filter Pushdown

Classic

List

Threaded

2 messages Options

forideal

Flink Lookup Filter Pushdown

Hello, my friend.

I have a dimension table.

create table dim_u_score( u_id bigint, varchar, score_a double, score_b double) with {xxx}In a scene

The condition of lookup is fliter score_a > 0.9

In another scenario

The condition of lookup is fliter score_b > 1

In Flink, at present, lookup join can use on to pass key values, such as

select score_a ... left join ... source_table.u_id=dim_u_score.u_id

If so, what should I do?

If not, can I say that I can create multiple tables with conditions to use when it comes?

such as

create table dim_u_score_filter_a( u_id bigint, varchar, score_a double, score_b double) 
with{
"filter_condition"="score_a > 0.9 "
}
create table dim_u_score_filter_b( u_id bigint, varchar, score_a double, score_b double)
with {
"filter_condition"="fliter score_b > 1 "
}

Then, in the process of lookup, push down to the specific execution engine to complete the lookup filter.

Jark Wu-3

Re: Flink Lookup Filter Pushdown

Hi forideal,

Currently, dimension table join requires at least one join key. That is the `u_id` in your example. The join key will be used as lookup key.

If you have some additional filters on dimension table, that's fine, Flink will help to filter the fetched data.

That means Flink supports following dimension join query:

select score_a ... left join ... source_table.u_id=dim_u_score.u_id where dim_u_score.score_b > 1;

At present, dimension table join doesn't pushdown filters into source, so if the data associated to the given lookup key is very large, it will have a high IO cost.

However, filter pushdown into lookup table is in the roadmap.

Best,

Jark

On Mon, 27 Apr 2020 at 20:08, forideal <[hidden email]> wrote:

Hello, my friend.
I have a dimension table.
create table dim_u_score( u_id bigint, varchar, score_a double, score_b double) with {xxx}In a scene
The condition of lookup is fliter score_a > 0.9
In another scenario
The condition of lookup is fliter score_b > 1
In Flink, at present, lookup join can use on to pass key values, such as
select score_a ... left join ... source_table.u_id=dim_u_score.u_id
If so, what should I do?
If not, can I say that I can create multiple tables with conditions to use when it comes?
such as
create table dim_u_score_filter_a( u_id bigint, varchar, score_a double, score_b double) 
with{
"filter_condition"="score_a > 0.9 "
}
create table dim_u_score_filter_b( u_id bigint, varchar, score_a double, score_b double)
with {
"filter_condition"="fliter score_b > 1 "
}
Then, in the process of lookup, push down to the specific execution engine to complete the lookup filter.