(DEPRECATED) Apache Flink User Mailing List archive.

Re: using updating shared data

Posted by Till Rohrmann on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/using-updating-shared-data-tp25278p25302.html

Yes exactly Avi.

Cheers,

Till

On Wed, Jan 2, 2019 at 5:42 PM Avi Levi <[hidden email]> wrote:

Thanks Till I will defiantly going to check it. just to make sure that I got you correctly. you are suggesting the the list that I want to broadcast will be broadcasted via control stream and it will be than be kept in the relevant operator state correct ? and updates (CRUD) on that list will be preformed via the control stream. correct ?
BR
Avi

On Wed, Jan 2, 2019 at 4:28 PM Till Rohrmann <[hidden email]> wrote:
Hi Avi,

you could use Flink's broadcast state pattern [1]. You would need to use the DataStream API but it allows you to have two streams (input and control stream) where the control stream is broadcasted to all sub tasks. So by ingesting messages into the control stream you can send model updates to all sub tasks.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html

Cheers,
Till

On Tue, Jan 1, 2019 at 6:49 PM miki haiat <[hidden email]> wrote:
Im trying to understand your use case.
What is the source of the data ? FS ,KAFKA else ?

On Tue, Jan 1, 2019 at 6:29 PM Avi Levi <[hidden email]> wrote:
Hi,
I have a list (couple of thousands text lines) that I need to use in my map function. I read this article about broadcasting variables or using distributed cache however I need to update this list from time to time, and if I understood correctly it is not possible on broadcast or cache without restarting the job. Is there idiomatic way to achieve this? A db seems to be an overkill for that and I do want to be cheap on io/network calls as much as possible.

Cheers
Avi