Apache Flink - Question about broadcast state pattern usage

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - Question about broadcast state pattern usage

M Singh
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about broadcast state pattern usage

Guowei Ma
Hi
1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState.  But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state.
2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it.


M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about broadcast state pattern usage

M Singh
Hi Guowei;

Thanks for your answer.

Do you have any example which illustrates using broadcast is used with multiple descriptors ?

Thanks



On Sunday, April 7, 2019, 10:10:15 PM EDT, Guowei Ma <[hidden email]> wrote:


Hi
1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState.  But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state.
2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it.


M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about broadcast state pattern usage

Fabian Hueske-2
Hi,

you would simply pass multiple MapStateDescriptors to the broadcast method:

MapStateDescriptor<A, B> bcState1 = ...
MapStateDescriptor<C, D> bcState2 = ...

DataStream<X> stream = ...
BroadcastStream<X> bcStream = stream.broadcast(bcState1, bcState2);

Best,
Fabian


Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb M Singh <[hidden email]>:
Hi Guowei;

Thanks for your answer.

Do you have any example which illustrates using broadcast is used with multiple descriptors ?

Thanks



On Sunday, April 7, 2019, 10:10:15 PM EDT, Guowei Ma <[hidden email]> wrote:


Hi
1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState.  But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state.
2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it.


M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about broadcast state pattern usage

M Singh
Hi Fabian:  Thanks for your answer.

From my understanding (please correct me), in the example above, we are passing map descriptors to the same broadcast stream.  So, the elements/items in that stream will be the same.  The only difference would be that in the processBroadcastElement method of the KeyedBroadcastProcessFunction impl, we could add different mappings of broadcast element (from the same broadcasted stream) to different map states. I am looking at the documentation example (https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html)  and still not sure how that will help ?

Thanks for your help.

Mans





On Thursday, April 11, 2019, 3:53:59 AM EDT, Fabian Hueske <[hidden email]> wrote:


Hi,

you would simply pass multiple MapStateDescriptors to the broadcast method:

MapStateDescriptor<A, B> bcState1 = ...
MapStateDescriptor<C, D> bcState2 = ...

DataStream<X> stream = ...
BroadcastStream<X> bcStream = stream.broadcast(bcState1, bcState2);

Best,
Fabian


Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb M Singh <[hidden email]>:
Hi Guowei;

Thanks for your answer.

Do you have any example which illustrates using broadcast is used with multiple descriptors ?

Thanks



On Sunday, April 7, 2019, 10:10:15 PM EDT, Guowei Ma <[hidden email]> wrote:


Hi
1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState.  But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state.
2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it.


M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about broadcast state pattern usage

Fabian Hueske-2
Hi,

I think your understanding is correct.
Having multiple map states for a broadcasted stream gives more flexibility.
You can have states of different key and value types and store completely different information in them.

Fabian



Am Fr., 12. Apr. 2019 um 20:30 Uhr schrieb M Singh <[hidden email]>:
Hi Fabian:  Thanks for your answer.

From my understanding (please correct me), in the example above, we are passing map descriptors to the same broadcast stream.  So, the elements/items in that stream will be the same.  The only difference would be that in the processBroadcastElement method of the KeyedBroadcastProcessFunction impl, we could add different mappings of broadcast element (from the same broadcasted stream) to different map states. I am looking at the documentation example (https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html)  and still not sure how that will help ?

Thanks for your help.

Mans





On Thursday, April 11, 2019, 3:53:59 AM EDT, Fabian Hueske <[hidden email]> wrote:


Hi,

you would simply pass multiple MapStateDescriptors to the broadcast method:

MapStateDescriptor<A, B> bcState1 = ...
MapStateDescriptor<C, D> bcState2 = ...

DataStream<X> stream = ...
BroadcastStream<X> bcStream = stream.broadcast(bcState1, bcState2);

Best,
Fabian


Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb M Singh <[hidden email]>:
Hi Guowei;

Thanks for your answer.

Do you have any example which illustrates using broadcast is used with multiple descriptors ?

Thanks



On Sunday, April 7, 2019, 10:10:15 PM EDT, Guowei Ma <[hidden email]> wrote:


Hi
1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState.  But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state.
2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it.


M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
Hi Flink folks:

I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions:

1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'.  From what I understand it can be used with connected streams.  Is there any other operator where it can be used ?

2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'.  Is there any additional documentation/example on how to implement/use multiple broadcast states ?

Thanks

Mans