Hello Flink Community! We have a flink streaming application with a particular use case where a closure variable Set<String> is used in a filter function. Currently, the variable is set at startup time. It’s populated from an S3 location, where several files exist (we consume the one with the last updated timestamp). Is it possible to periodically update (say once every 24 hours) this closure variable? My initial research indicates that we cannot update closure variables and expect them to show up at the workers. There seems to be something called BrodcastStream in Flink.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html Is that the right approach? I would like some kind of a confirmation before I go deeper into it. cheers Kumar |
Hi Senthil, I think you are right that you cannot update closure variables directly and expect them to show up at the workers. If the variable values are read from S3 files, I think currently you will need to define a source explicitly to read the latest value of the file. Whether to use BroadcastedStream should depends on how you want to access the set of string: if you want to broadcast the same strings to all the tasks, then broadcast stream is the solution and if you want to distribute the set of strings in other methods, you could also use more generic connect streams like: streamA.connect(streamB.keyBy()).process(xx). [1] Best, Yun
|
Free forum by Nabble | Edit this page |