(DEPRECATED) Apache Flink User Mailing List archive.

Keyed State

Classic

List

Threaded

5 messages Options

Boris Lublinsky

Keyed State

Can you, please confirm that my understanding is correct?

I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins

And the example there.

When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?

It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.

In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

Fabian Hueske-2

Re: Keyed State

Yes, that is correct.

You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.

2018-01-13 17:07 GMT+01:00 Boris Lublinsky <[hidden email]>:

Can you, please confirm that my understanding is correct?
I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins
And the example there.
When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

Boris Lublinsky

Re: Keyed State

Thanks Fabian

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Jan 13, 2018, at 11:06 AM, Fabian Hueske <[hidden email]> wrote:

Yes, that is correct.
You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.

2018-01-13 17:07 GMT+01:00 Boris Lublinsky <[hidden email]>:
Can you, please confirm that my understanding is correct?
I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins
And the example there.
When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

Boris Lublinsky

Re: Keyed State

In reply to this post by Fabian Hueske-2

Thanks Fabian

Can you also explain a thread model?

What is the paralelization between multiple keys? Is it hash based?

And also are processElement 1 and 2 are executed on different threads?

More specifically if processElement is an order of magnitude slower then 2, will it impact processElement 2?

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Jan 13, 2018, at 11:06 AM, Fabian Hueske <[hidden email]> wrote:

Yes, that is correct.
You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.

2018-01-13 17:07 GMT+01:00 Boris Lublinsky <[hidden email]>:
Can you, please confirm that my understanding is correct?
I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins
And the example there.
When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

Fabian Hueske-2

Re: Keyed State

Sure.

A CoProcessFunction is executed in parallel by running multiple instances of the CoProcessFunction. Each instance runs in a separate TaskManager slot and is responsible for a subset of all keys. Keys are assigned by hash partitioning to function instances.

All calls to methods of an individual CoProcessFunction instance are synchronized, i.e., processElement1, processElement2, and onTimer are never concurrently called. A long running processElement1 method will block all method calls for all keys that are assigned to the same instance (not just method calls for the same key).

2018-01-13 20:33 GMT+01:00 Boris Lublinsky <[hidden email]>:

Thanks Fabian
Can you also explain a thread model?
What is the paralelization between multiple keys? Is it hash based?
And also are processElement 1 and 2 are executed on different threads?
More specifically if processElement is an order of magnitude slower then 2, will it impact processElement 2?

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Jan 13, 2018, at 11:06 AM, Fabian Hueske <[hidden email]> wrote:

Yes, that is correct.
You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.

2018-01-13 17:07 GMT+01:00 Boris Lublinsky <[hidden email]>:
Can you, please confirm that my understanding is correct?
I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins
And the example there.
When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/