Re: Handling Large Broadcast States

Posted by Timo Walther on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Handling-Large-Broadcast-States-tp44493p44573.html

Hi Rion,

as far as I know we also don't support broadcast streaming joins in
Table API/SQL.

Are you sure that you need a broadcast pattern? Or would a regular hash
join using connect() with a CoProcessFunction also work for you? Maybe
with an artifical key to spread the load more evently?

What we support both in Table API/SQL and DataStream API are async
lookups to external sources. E.g. Table API JDBC has lookup functionality.

I hope this helps a bit.

Regards,
Timo



On 18.06.21 14:00, Piotr Nowojski wrote:

> Hi,
>
> As far as I know there are no plans to support other state backends with
> BroadcastState. I don't know about any particular technical limitation,
> it probably just hasn't been done. Also I don't know how much effort
> that would be. Probably it wouldn't be easy.
>
>   Timo, can you chip in how for example Table API/SQL is solving this
> problem? I'm pretty sure Tablie API is using broadcast joins after all?
>
> Best,
> Piotrek
>
> czw., 17 cze 2021 o 02:53 Rion Williams <[hidden email]
> <mailto:[hidden email]>> napisał(a):
>
>     Hey Flink folks,
>
>     I was discussing the use of the Broadcast Pattern with some
>     colleagues today for a potential enrichment use-case and noticed
>     that it wasn’t currently backed by RocksDB. This seems to indicate
>     that it would be solely limited to the memory allocated, which might
>     not support a large enrichment data set that our use case might run
>     into (thousands of tenants with users and various other entities to
>     enrich by).
>
>     Are there any plans to eventually add support for BroadcastState to
>     be backed by a non-memory source? Or perhaps some technical
>     limitations that might not make that possible? If the latter is
>     true, is there a preferred pattern for handling enrichment/lookups
>     for a very large set of data that may not be memory-bound?
>
>     Any advice or thoughts would be welcome!
>
>     Rion
>