Hi, I am pretty new to Flink and I'm trying to implement - which seems to me - a pretty basic pattern: Say I have a single stream of Price objects encapsulating a price value and a symbol (for example A to Z) they are emitted at a very random interval all day - could be 10000 /day or once a week. ..(price= .22, symbol = 'B')...(price= .12 , symbol = 'C')..
(price= .12 , symbol = 'A').
.(price= .22, symbol = 'Z')... I want to define an operator that should emit a single object only once when ALL symbols have been received (not before) - the object will include all the received prices. If a price is received again for the same symbol it will reemit the same object with the updated price and all previous prices will stay the same. If the stream is keyed by symbol I understand that using MapState state will not help because the state is local to the partition - each task will need to know that all symbols have been received. Basically I'm looking for a global state for a single keyed stream - a state accessible by all parallel tasks - I have used the broadcast pattern but I understand this is when connecting 2 streams - is there a way to do it without forcing parallelism to 1. Thanks in advance for your assistance Adam |
Hi Adam,
sorry for the late reply. Introducing a global state is something that should be avoided as it introduces bottlenecks and/or concurrency/order issues. Broadcasting the state between different subtasks will also bring a loss in performance since each state change has to be shared with every other subtask. Ideally, you might be able to reconsider the design of your pipeline. Is there a specific reason that prevents you from doing the merging on a single instance? Best, Matthias On Wed, Sep 16, 2020 at 11:21 PM Adam Atrea <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |