Hi Flink folks: I am reading the documentation on broadcast state pattern (https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/broadcast_state.html) and have following questions: 1. Point number 2 - '2. it is only available to specific operators that have as inputs a broadcasted stream and a non-broadcasted one,'. From what I understand it can be used with connected streams. Is there any other operator where it can be used ? 2. Point number 3 - '3. such an operator can have multiple broadcast states with different names.'. Is there any additional documentation/example on how to implement/use multiple broadcast states ? Thanks Mans |
Hi 1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState. But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state. 2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it. M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
|
Hi Guowei; Thanks for your answer. Do you have any example which illustrates using broadcast is used with multiple descriptors ? Thanks
On Sunday, April 7, 2019, 10:10:15 PM EDT, Guowei Ma <[hidden email]> wrote:
Hi 1. I think you could use "Using Managed Operator State"[1] (context.getOperatorStateStore().getBroadcastState()) to use the BroadCastState. But you must use it very carefully and guarantee the semantics of broadcast state yourself. I think "The Broadcast State Pattern"[2] is some best practice for using broadcast state. 2. The broadcast function is varargs. Since that you could pass multiple MapStateDescriptors to it. M Singh <[hidden email]> 于2019年4月7日周日 下午10:17写道:
|
Hi, you would simply pass multiple MapStateDescriptors to the broadcast method: MapStateDescriptor<A, B> bcState1 = ... MapStateDescriptor<C, D> bcState2 = ... DataStream<X> stream = ... BroadcastStream<X> bcStream = stream.broadcast(bcState1, bcState2); Best, Fabian Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb M Singh <[hidden email]>:
|
Hi Fabian: Thanks for your answer. From my understanding (please correct me), in the example above, we are passing map descriptors to the same broadcast stream. So, the elements/items in that stream will be the same. The only difference would be that in the processBroadcastElement method of the KeyedBroadcastProcessFunction impl, we could add different mappings of broadcast element (from the same broadcasted stream) to different map states. I am looking at the documentation example (https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html) and still not sure how that will help ? Thanks for your help. Mans
On Thursday, April 11, 2019, 3:53:59 AM EDT, Fabian Hueske <[hidden email]> wrote:
Hi, you would simply pass multiple MapStateDescriptors to the broadcast method: MapStateDescriptor<A, B> bcState1 = ... MapStateDescriptor<C, D> bcState2 = ... DataStream<X> stream = ... BroadcastStream<X> bcStream = stream.broadcast(bcState1, bcState2); Best, Fabian Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb M Singh <[hidden email]>:
|
Hi, I think your understanding is correct. Having multiple map states for a broadcasted stream gives more flexibility. You can have states of different key and value types and store completely different information in them. Fabian Am Fr., 12. Apr. 2019 um 20:30 Uhr schrieb M Singh <[hidden email]>:
|
Free forum by Nabble | Edit this page |