Share broadcast state between multiple operators

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Share broadcast state between multiple operators

Richard Deurwaarder
Hi All,

Due to the way our code is structured, we would like to use the broadcast state at multiple points of our pipeline. So not only share it between multiple instances of the same operator but also between multiple operators. See the image below for a simplified example.

Flink does not seem to have any problems with this at runtime but I wonder:
  • Is this a good pattern and was it designed with something like this in mind?
  • If we use the same MapStateDescriptor in both operators, does the state only get stored once? And does it also only get written once?

broadcast-state.png

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Share broadcast state between multiple operators

Till Rohrmann
Hi Richard,

Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their own copy of the state.

Cheers,
Till

On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <[hidden email]> wrote:
Hi All,

Due to the way our code is structured, we would like to use the broadcast state at multiple points of our pipeline. So not only share it between multiple instances of the same operator but also between multiple operators. See the image below for a simplified example.

Flink does not seem to have any problems with this at runtime but I wonder:
  • Is this a good pattern and was it designed with something like this in mind?
  • If we use the same MapStateDescriptor in both operators, does the state only get stored once? And does it also only get written once?

broadcast-state.png

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Share broadcast state between multiple operators

Richard Deurwaarder
Hello Till,

So if I understand correctly, when messages get broadcast to multiple operators, each operator will execute the processBroadcast() function and store the state under a sort of operator scope? Even if they use the same MapStateDescriptor?

And if it replicates the state between operators is what makes the broadcast state different from an Operator state with Union redistribution?

Thanks for any clarification, very interesting to learn about :)

Richard

On Tue, Feb 26, 2019 at 11:57 AM Till Rohrmann <[hidden email]> wrote:
Hi Richard,

Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their own copy of the state.

Cheers,
Till

On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <[hidden email]> wrote:
Hi All,

Due to the way our code is structured, we would like to use the broadcast state at multiple points of our pipeline. So not only share it between multiple instances of the same operator but also between multiple operators. See the image below for a simplified example.

Flink does not seem to have any problems with this at runtime but I wonder:
  • Is this a good pattern and was it designed with something like this in mind?
  • If we use the same MapStateDescriptor in both operators, does the state only get stored once? And does it also only get written once?

broadcast-state.png

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Share broadcast state between multiple operators

Till Rohrmann
On Tue, Feb 26, 2019 at 3:10 PM Richard Deurwaarder <[hidden email]> wrote:
Hello Till,

So if I understand correctly, when messages get broadcast to multiple operators, each operator will execute the processBroadcast() function and store the state under a sort of operator scope? Even if they use the same MapStateDescriptor?
Yes. 


And if it replicates the state between operators is what makes the broadcast state different from an Operator state with Union redistribution?
The union redistribution is relevant in case of a restart where the every operator receives the state from all other operators. The individual operator states can be different. In case of broadcast state every operator's state will be the same. So there is no union redistribution needed.

Cheers,
Till
 

Thanks for any clarification, very interesting to learn about :)

Richard

On Tue, Feb 26, 2019 at 11:57 AM Till Rohrmann <[hidden email]> wrote:
Hi Richard,

Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their own copy of the state.

Cheers,
Till

On Mon, Feb 25, 2019 at 11:45 AM Richard Deurwaarder <[hidden email]> wrote:
Hi All,

Due to the way our code is structured, we would like to use the broadcast state at multiple points of our pipeline. So not only share it between multiple instances of the same operator but also between multiple operators. See the image below for a simplified example.

Flink does not seem to have any problems with this at runtime but I wonder:
  • Is this a good pattern and was it designed with something like this in mind?
  • If we use the same MapStateDescriptor in both operators, does the state only get stored once? And does it also only get written once?

broadcast-state.png

Thanks!