Hi team,
We have a use case to join multiple data sources to generate a continuous updated view. We defined primary key constraint on all the input sources and all the keys are the subsets in the join condition. All joins are left join.
In our case, the first two inputs can produce
JoinKeyContainsUniqueKey input sepc, which is good and performant. While when it comes to the third input source, it's joined with the intermediate output table of the first two input tables, and the intermediate table does not carry key constraint information(although the thrid source input table does), so it results in a
NoUniqueKey input sepc. Given NoUniqueKey inputs has dramatic performance implications per the
Force Join Unique Key email thread, we want to know if there is any mitigation plan for this.
One solution I can come up with is to write the intermediate result into some place like Kafka with unique constraint and join with the third source, while it requires extra resources. Any other suggestion on this? Thanks.