Flink Lookup Filter Pushdown

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Lookup Filter Pushdown

forideal

Hello, my friend.

I have a dimension table.

create table dim_u_score( u_id bigint, varchar, score_a double, score_b double) with {xxx}In a scene

The condition of lookup is fliter score_a > 0.9

In another scenario

The condition of lookup is fliter score_b > 1

In Flink, at present, lookup join can use on to pass key values, such as

select score_a ... left join ... source_table.u_id=dim_u_score.u_id

If so, what should I do?

If not, can I say that I can create multiple tables with conditions to use when it comes?

such as

create table dim_u_score_filter_a( u_id bigint, varchar, score_a double, score_b double) 
with{
"filter_condition"="score_a > 0.9 "
}
create table dim_u_score_filter_b( u_id bigint, varchar, score_a double, score_b double)
with {
"filter_condition"="fliter score_b > 1 "
}

Then, in the process of lookup, push down to the specific execution engine to complete the lookup filter.



 

Reply | Threaded
Open this post in threaded view
|

Re: Flink Lookup Filter Pushdown

Jark Wu-3
Hi forideal,

Currently, dimension table join requires at least one join key. That is the `u_id` in your example. The join key will be used as lookup key.
If you have some additional filters on dimension table, that's fine, Flink will help to filter the fetched data. 
That means Flink supports following dimension join query:

select score_a ... left join ... source_table.u_id=dim_u_score.u_id where dim_u_score.score_b > 1;


At present, dimension table join doesn't pushdown filters into source, so if the data associated to the given lookup key is very large, it will have a high IO cost. 
However, filter pushdown into lookup table is in the roadmap. 

Best,
Jark







On Mon, 27 Apr 2020 at 20:08, forideal <[hidden email]> wrote:

Hello, my friend.

I have a dimension table.

create table dim_u_score( u_id bigint, varchar, score_a double, score_b double) with {xxx}In a scene

The condition of lookup is fliter score_a > 0.9

In another scenario

The condition of lookup is fliter score_b > 1

In Flink, at present, lookup join can use on to pass key values, such as

select score_a ... left join ... source_table.u_id=dim_u_score.u_id

If so, what should I do?

If not, can I say that I can create multiple tables with conditions to use when it comes?

such as

create table dim_u_score_filter_a( u_id bigint, varchar, score_a double, score_b double) 
with{
"filter_condition"="score_a > 0.9 "
}
create table dim_u_score_filter_b( u_id bigint, varchar, score_a double, score_b double)
with {
"filter_condition"="fliter score_b > 1 "
}

Then, in the process of lookup, push down to the specific execution engine to complete the lookup filter.