Hi: I have a few question about how state is shared in processors in Flink. 1. If I have a processor instantiated in the Flink app, and apply use in multiple times in the Flink - (a) if the tasks are in the same slot - do they share the same processor on the taskmanager ? (b) if the tasks are on same node but different slots - do they share the same processor on the taskmanager ? 2. If I instantiate a single processor with local state and use it in multiple times in Flink (a) if the tasks are in the same slot - do they share the same processor and state on the taskmanager ? (b) if the tasks are on same node but different slots - do they share the same processor and state on the taskmanager ? 3. If I instantiate a multiple processors with shared collection and use it in multiple times in Flink (a) if the tasks are in the same slot - do they share the state on the taskmanager ? (b) if the tasks are on same node but different slots - do they share the state on the taskmanager ? 4. How do the above scenarios affect sharing (a) operator state (b) keyed state 5. If I have have a parallelism of > 1, and use keyBy - is each key handled by only one instance of the processor ? I believe so, but wanted to confirm. Thanks Mans |
Hi Mans
What's the meaning of 'processor' you defined here? A user defined function?
When talking about share state, I'm afraid it's not so easy to implement in Flink. As no matter keyed state or operator state, they're both instantiated, used and only thread-safe in operator scope. The only way to read read-only state during runtime is via
queryable state[1]
For the question of keyBy, the message would only sent to one of task in downstream according to the hashcode [2].
Best
Yun Tang
From: M Singh <[hidden email]>
Sent: Friday, January 10, 2020 23:29 To: User <[hidden email]> Subject: Apache Flink - Sharing state in processors Hi:
I have a few question about how state is shared in processors in Flink. 1. If I have a processor instantiated in the Flink app, and apply use in multiple times in the Flink -
(a) if the tasks are in the same slot - do they share the same processor
on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the same processor on the taskmanager ?
2. If I instantiate a single processor with local state and use it in multiple times in Flink
(a) if the tasks are in the same slot - do they share the same processor and state
on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the same processor and state
on the taskmanager ?
3. If I instantiate a multiple processors with shared collection and use it in multiple times in Flink
(a) if the tasks are in the same slot - do they share the state on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the state
on the taskmanager ?
4. How do the above scenarios affect sharing
(a) operator state
(b) keyed state
5. If I have have a parallelism of > 1, and use keyBy - is each key handled by only one instance of the processor ? I believe so, but wanted to confirm.
Thanks
Mans
|
Thanks Yun for your answers. By processor I did mean user defined processor function. Keeping that in view, do you have any advice on how the shared state - ie, the parameters passed to the processor as mentioned above (not the key state or operator state) will be affected in a distributed runtime env ? Mans
On Sunday, January 12, 2020, 09:51:10 PM EST, Yun Tang <[hidden email]> wrote:
Hi Mans
What's the meaning of 'processor' you defined here? A user defined function?
When talking about share state, I'm afraid it's not so easy to implement in Flink. As no matter keyed state or operator state, they're both instantiated, used and only thread-safe in operator scope. The only way to read read-only state during runtime is via
queryable state[1]
For the question of keyBy, the message would only sent to one of task in downstream according to the hashcode [2].
Best
Yun Tang
From: M Singh <[hidden email]>
Sent: Friday, January 10, 2020 23:29 To: User <[hidden email]> Subject: Apache Flink - Sharing state in processors Hi:
I have a few question about how state is shared in processors in Flink. 1. If I have a processor instantiated in the Flink app, and apply use in multiple times in the Flink -
(a) if the tasks are in the same slot - do they share the same processor
on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the same processor on the taskmanager ?
2. If I instantiate a single processor with local state and use it in multiple times in Flink
(a) if the tasks are in the same slot - do they share the same processor and state
on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the same processor and state
on the taskmanager ?
3. If I instantiate a multiple processors with shared collection and use it in multiple times in Flink
(a) if the tasks are in the same slot - do they share the state on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the state
on the taskmanager ?
4. How do the above scenarios affect sharing
(a) operator state
(b) keyed state
5. If I have have a parallelism of > 1, and use keyBy - is each key handled by only one instance of the processor ? I believe so, but wanted to confirm.
Thanks
Mans
|
1. a/b) No, they are
deserialized into separate instances in any case and are
independent afterwards.
2. a/b) No, see 1).
3. a/b) No, as individual tasks are isolated by
different class-loaders.
On 23/01/2020 09:25, M Singh wrote:
|
Free forum by Nabble | Edit this page |