Flink streaming. Broadcast reference data map across nodes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink streaming. Broadcast reference data map across nodes

Vadim Vararu
Hi all,


I would like to do something similar to Spark's broadcast mechanism.

Basically, i have a big dictionary of reference data that has to be
accessible from all the nodes (in order to do some joins of log line
with reference line).

I did not find yet a way to do it.


Any ideas?

Reply | Threaded
Open this post in threaded view
|

Re: Flink streaming. Broadcast reference data map across nodes

Ufuk Celebi
On Tue, Feb 21, 2017 at 2:35 PM, Vadim Vararu <[hidden email]> wrote:
> Basically, i have a big dictionary of reference data that has to be
> accessible from all the nodes (in order to do some joins of log line with
> reference line).

If the dictionary is small you can make it part of the closures that
are send to the task managers. Just make it part of your function.

If it is large, I'm not sure what the best way is to do it is right
now. I've CC'd Aljoscha who can probably help...
Reply | Threaded
Open this post in threaded view
|

Re: Flink streaming. Broadcast reference data map across nodes

Aljoscha Krettek
Hi,
what Ufuk said is valid. In addition, you can make your function a RichFunction and load the static data in the open() method.

In the future, you might be able to handle this use case with a feature called side inputs that we're currently working on: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit

Best,
Aljoscha

On Tue, 21 Feb 2017 at 15:50 Ufuk Celebi <[hidden email]> wrote:
On Tue, Feb 21, 2017 at 2:35 PM, Vadim Vararu <[hidden email]> wrote:
> Basically, i have a big dictionary of reference data that has to be
> accessible from all the nodes (in order to do some joins of log line with
> reference line).

If the dictionary is small you can make it part of the closures that
are send to the task managers. Just make it part of your function.

If it is large, I'm not sure what the best way is to do it is right
now. I've CC'd Aljoscha who can probably help...