Can you, please confirm that my understanding is correct? I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins And the example there. When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct? It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value. In my code I can use ValueState and assume that Flink will keep track of it on per key fashion. |
Yes, that is correct. You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.2018-01-13 17:07 GMT+01:00 Boris Lublinsky <[hidden email]>:
|
Thanks Fabian
|
In reply to this post by Fabian Hueske-2
Thanks Fabian
Can you also explain a thread model? What is the paralelization between multiple keys? Is it hash based? And also are processElement 1 and 2 are executed on different threads? More specifically if processElement is an order of magnitude slower then 2, will it impact processElement 2?
|
Sure. A CoProcessFunction is executed in parallel by running multiple instances of the CoProcessFunction. Each instance runs in a separate TaskManager slot and is responsible for a subset of all keys. Keys are assigned by hash partitioning to function instances. All calls to methods of an individual CoProcessFunction instance are synchronized, i.e., processElement1, processElement2, and onTimer are never concurrently called. A long running processElement1 method will block all method calls for all keys that are assigned to the same instance (not just method calls for the same key). 2018-01-13 20:33 GMT+01:00 Boris Lublinsky <[hidden email]>:
|
Free forum by Nabble | Edit this page |