Hi,
I have a single broadcast message that contains configuration data consumed by different operators. For eg: config = { "config1" : 1, "config2" : 2, "config3" : 3 } Operator 1 will consume config1 only, operator 2 will consume config2 only etc.
I was wondering which approach would be the best to go with performance wise. I don't really have the time to implement both and compare, so perhaps someone here already knows if one approach is better or both provide similar performance. FWIW, the config stream is very sporadic compared to the event stream. Thank you, Manas Kale |
Hi Manas, The approaches you described looks the same: > each operator only stores what it needs. > each downstream operator will "strip off" the config parameter that it needs. Can you please explain the difference? Regards,
Roman On Mon, May 11, 2020 at 8:07 AM Manas Kale <[hidden email]> wrote:
|
Sure. Apologies for not making this clear enough. > each operator only stores what it needs. Lets imagine this setup : BROADCAST STREAM In this scenario, all 3 operators will be BroadcastProcessFunctions. Each of them will receive the whole config message in their processBroadcastElement method, but each one will only store what it needs in their state store. So even though operator1 will receive config = { "config1" : 1, "config2" : 2, "config3" : 3 } it will only store config1. > each downstream operator will "strip off" the config parameter that it needs. BROADCAST STREAM In this case, the enricher operator will store the whole config message. When an event message arrives, this operator will append config1, config2 and config3 to it. Operator 1 will extract and use config1, and output a message that has config1 stripped off. I hope that helps! Perhaps I am being too pedantic but I would like to know if these two methods have comparable performance differences and if so which one would be preferred. On Mon, May 11, 2020 at 11:46 PM Khachatryan Roman <[hidden email]> wrote:
|
Thanks for the clarification. Apparently, the second option (with enricher) creates more load by adding configuration to every event. Unless events are much bigger than the configuration, this will significantly increase network, memory, and CPU usage. Btw, I think you don't need a broadcast in the 2nd option, since the interested subtask will receive the configuration anyways. Regards,
Roman On Tue, May 12, 2020 at 5:57 AM Manas Kale <[hidden email]> wrote:
|
I see, thank you Roman! On Tue, May 12, 2020 at 4:59 PM Khachatryan Roman <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |