I am new to Flink and trying to understand the keyBy and KeyedStream. From the short doc description I expected it to partition the data such that the following flatMap would only see elements with the same key. That events with different keys would be presented to different instances of FlatMapFunction. But, I am seeing it present all events in the stream to the same FlatMapFunction.
Michael |
Hi,
KeyBy operation partition the data on given key and make sure same slot will get all future data belonging to same key. In default implementation, it can also map subset of keys in your DataStream to same slot. Assuming you have number of keys equal to number running slot then you may specify your custom keyBy operation to the achieve the same. Could you specify your case. -- Thanks Amit -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Amit is correct. keyBy() ensures that all records with the same key are processed by the same paralllel instance of a function. I had a look at the docs [1]. I agree that "Logically partitions a stream into disjoint partitions, each partition containing elements of the same key." can be easily interpreted as you did. I've pushed a commit to clarify the description. The docs should be updated soon. 2018-04-05 6:21 GMT+02:00 Amit Jain <[hidden email]>: Hi, |
Thanks for the clarification. I was just trying to understand the intended behavior. It would have been nice if Flink tracked state for downstream operators by key, but I can do that with a map in the downstream functions.
Michael
|
Hi, I think Flink is exactly doing what you are looking for.[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#keyed-state 2018-04-05 14:08 GMT+02:00 Michael Latta <[hidden email]>:
|
Yes. It took a bit of digging in the website to find RichFlatMapFunction to get managed state.
Michael
|
I have a question related to KeyedStream, asking it here instead of starting a new thread. If I assign timestamps on a keyed stream, the resulting stream is not keyed. So essentially I would need to apply the key by operator again after the assign timestamps operator.Shailesh On Fri, Apr 6, 2018 at 4:31 PM, Michael Latta <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |