Join Bottleneck

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Join Bottleneck

Rex Fenley
Hello,

I have a Job that's a series of Joins, GroupBys, and Aggs and it's bottlenecked in one of the joins. The join's cardinality is ~300 million rows on the left and ~200 million rows on the right all with unique keys. I'm seeing this in the plan for that bottlenecked Join.

Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], rightInputSpec=[JoinKeyContainsUniqueKey])

The join condition is basically (left.user_id === right.id). So `id0` must be right.id here.

My first question is, what is the difference between
leftInputSpec=[HasUniqueKey]
and 
rightInputSpec=[JoinKeyContainsUniqueKey]
 ?

Is the left side not using the join key for hashing the join but instead using its pk id, which would be underperformant?

Is there anything else about this that stands out?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Join Bottleneck

Till Rohrmann
Hi Rex,

"HasUniqueKey" means that the left input has a unique key. "JoinKeyContainsUniqueKey" means that the join key of the right side contains the unique key of this relation. Hence, it looks normal to me.

Cheers,
Till

On Fri, Nov 6, 2020 at 7:29 PM Rex Fenley <[hidden email]> wrote:
Hello,

I have a Job that's a series of Joins, GroupBys, and Aggs and it's bottlenecked in one of the joins. The join's cardinality is ~300 million rows on the left and ~200 million rows on the right all with unique keys. I'm seeing this in the plan for that bottlenecked Join.

Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], rightInputSpec=[JoinKeyContainsUniqueKey])

The join condition is basically (left.user_id === right.id). So `id0` must be right.id here.

My first question is, what is the difference between
leftInputSpec=[HasUniqueKey]
and 
rightInputSpec=[JoinKeyContainsUniqueKey]
 ?

Is the left side not using the join key for hashing the join but instead using its pk id, which would be underperformant?

Is there anything else about this that stands out?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Join Bottleneck

Rex Fenley
Thank you for the clarification.

On Sat, Nov 7, 2020 at 7:37 AM Till Rohrmann <[hidden email]> wrote:
Hi Rex,

"HasUniqueKey" means that the left input has a unique key. "JoinKeyContainsUniqueKey" means that the join key of the right side contains the unique key of this relation. Hence, it looks normal to me.

Cheers,
Till

On Fri, Nov 6, 2020 at 7:29 PM Rex Fenley <[hidden email]> wrote:
Hello,

I have a Job that's a series of Joins, GroupBys, and Aggs and it's bottlenecked in one of the joins. The join's cardinality is ~300 million rows on the left and ~200 million rows on the right all with unique keys. I'm seeing this in the plan for that bottlenecked Join.

Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], rightInputSpec=[JoinKeyContainsUniqueKey])

The join condition is basically (left.user_id === right.id). So `id0` must be right.id here.

My first question is, what is the difference between
leftInputSpec=[HasUniqueKey]
and 
rightInputSpec=[JoinKeyContainsUniqueKey]
 ?

Is the left side not using the join key for hashing the join but instead using its pk id, which would be underperformant?

Is there anything else about this that stands out?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US