Optimizing Flink joins

classic Classic list List threaded Threaded
5 messages Options
Dan
Reply | Threaded
Open this post in threaded view
|

Optimizing Flink joins

Dan
Hi!  I was curious if there are docs on how to optimize Flink joins.  I looked around and on the Flink docs and didn't see much.  I see a little on the Configuration page.

E.g. one of my jobs has an interval join.  Does left vs right matter for interval join?
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Flink joins

Timo Walther
Hi Dan,

the order of all joins depends on the order in the SQL query by default.

You can also check the following example (not interval joins though) and
swap e.g. b and c:

env.createTemporaryView("a", env.fromValues(1, 2, 3));
env.createTemporaryView("b", env.fromValues(4, 5, 6));
env.createTemporaryView("c", env.fromValues(7, 8, 9));

System.out.println(env.sqlQuery("SELECT * FROM c, b, a").explain());

So you can reorder the tables in the query if that improves performance.
For interval joins, we currently don't provide additional algorithms or
options.

Regards,
Timo

On 11.02.21 05:04, Dan Hill wrote:
> Hi!  I was curious if there are docs on how to optimize Flink joins.  I
> looked around and on the Flink docs and didn't see much.  I see a little
> on the Configuration page.
>
> E.g. one of my jobs has an interval join.  Does left vs right matter for
> interval join?

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Flink joins

Dan
Hi Timo!  I'm moving away from SQL to DataStream.

On Thu, Feb 11, 2021 at 9:11 AM Timo Walther <[hidden email]> wrote:
Hi Dan,

the order of all joins depends on the order in the SQL query by default.

You can also check the following example (not interval joins though) and
swap e.g. b and c:

env.createTemporaryView("a", env.fromValues(1, 2, 3));
env.createTemporaryView("b", env.fromValues(4, 5, 6));
env.createTemporaryView("c", env.fromValues(7, 8, 9));

System.out.println(env.sqlQuery("SELECT * FROM c, b, a").explain());

So you can reorder the tables in the query if that improves performance.
For interval joins, we currently don't provide additional algorithms or
options.

Regards,
Timo

On 11.02.21 05:04, Dan Hill wrote:
> Hi!  I was curious if there are docs on how to optimize Flink joins.  I
> looked around and on the Flink docs and didn't see much.  I see a little
> on the Configuration page.
>
> E.g. one of my jobs has an interval join.  Does left vs right matter for
> interval join?

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Flink joins

Timo Walther
Hi Dan,

thanks for letting us know. Could you give us some feedback what is
missing in SQL for this use case? Are you looking for some broadcast
joining or which kind of algorithm would help you?

Regards,
Timo

On 11.02.21 20:32, Dan Hill wrote:

> Hi Timo!  I'm moving away from SQL to DataStream.
>
> On Thu, Feb 11, 2021 at 9:11 AM Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dan,
>
>     the order of all joins depends on the order in the SQL query by default.
>
>     You can also check the following example (not interval joins though)
>     and
>     swap e.g. b and c:
>
>     env.createTemporaryView("a", env.fromValues(1, 2, 3));
>     env.createTemporaryView("b", env.fromValues(4, 5, 6));
>     env.createTemporaryView("c", env.fromValues(7, 8, 9));
>
>     System.out.println(env.sqlQuery("SELECT * FROM c, b, a").explain());
>
>     So you can reorder the tables in the query if that improves
>     performance.
>     For interval joins, we currently don't provide additional algorithms or
>     options.
>
>     Regards,
>     Timo
>
>     On 11.02.21 05:04, Dan Hill wrote:
>      > Hi!  I was curious if there are docs on how to optimize Flink
>     joins.  I
>      > looked around and on the Flink docs and didn't see much.  I see a
>     little
>      > on the Configuration page.
>      >
>      > E.g. one of my jobs has an interval join.  Does left vs right
>     matter for
>      > interval join?
>

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Flink joins

Dan
Flink SQL is missing a reliable, commonly-used way of evolving a job (e.g. savepointing and checkpointing).

The other concepts I heard about are not shared publicly enough to rely on (e.g. roll off).  I wasn't able to find anything useful on this.

On Fri, Feb 12, 2021, 02:05 Timo Walther <[hidden email]> wrote:
Hi Dan,

thanks for letting us know. Could you give us some feedback what is
missing in SQL for this use case? Are you looking for some broadcast
joining or which kind of algorithm would help you?

Regards,
Timo

On 11.02.21 20:32, Dan Hill wrote:
> Hi Timo!  I'm moving away from SQL to DataStream.
>
> On Thu, Feb 11, 2021 at 9:11 AM Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dan,
>
>     the order of all joins depends on the order in the SQL query by default.
>
>     You can also check the following example (not interval joins though)
>     and
>     swap e.g. b and c:
>
>     env.createTemporaryView("a", env.fromValues(1, 2, 3));
>     env.createTemporaryView("b", env.fromValues(4, 5, 6));
>     env.createTemporaryView("c", env.fromValues(7, 8, 9));
>
>     System.out.println(env.sqlQuery("SELECT * FROM c, b, a").explain());
>
>     So you can reorder the tables in the query if that improves
>     performance.
>     For interval joins, we currently don't provide additional algorithms or
>     options.
>
>     Regards,
>     Timo
>
>     On 11.02.21 05:04, Dan Hill wrote:
>      > Hi!  I was curious if there are docs on how to optimize Flink
>     joins.  I
>      > looked around and on the Flink docs and didn't see much.  I see a
>     little
>      > on the Configuration page.
>      >
>      > E.g. one of my jobs has an interval join.  Does left vs right
>     matter for
>      > interval join?
>