Hi forideal,
Currently, dimension table join requires at least one join key. That is the `u_id` in your example. The join key will be used as lookup key.
If you have some additional filters on dimension table, that's fine, Flink will help to filter the fetched data.
That means Flink supports following dimension join query:
select score_a ... left join ... source_table.u_id=dim_u_score.u_id where dim_u_score.score_b > 1;
At present, dimension table join doesn't pushdown filters into source, so if the data associated to the given lookup key is very large, it will have a high IO cost.
However, filter pushdown into lookup table is in the roadmap.
Best,
Jark
Hello, my friend.
I have a dimension table.
create table dim_u_score( u_id bigint, varchar, score_a double, score_b double) with {xxx}In a scene
The condition of lookup is fliter score_a > 0.9
In another scenario
The condition of lookup is fliter score_b > 1
In Flink, at present, lookup join can use on to pass key values, such as
select score_a ... left join ... source_table.u_id=dim_u_score.u_id
If so, what should I do?
If not, can I say that I can create multiple tables with conditions to use when it comes?
such as
create table dim_u_score_filter_a( u_id bigint, varchar, score_a double, score_b double)
with{
"filter_condition"="score_a > 0.9 "
}
create table dim_u_score_filter_b( u_id bigint, varchar, score_a double, score_b double)
with {
"filter_condition"="fliter score_b > 1 "
}
Then, in the process of lookup, push down to the specific execution engine to complete the lookup filter.