(DEPRECATED) Apache Flink User Mailing List archive.

Production readiness

Classic

List

Threaded

5 messages Options

avilevi

Production readiness

Looking at the production readiness checklist - is there any rule of thumb to determine the maximum parallelism ? we have a stateful pipeline with high throughput (4k requests/sec) running on google cloud (yarn) .

I understood that if we are not setting it the default setting is 128 but it can change in the future but if we set it, it cannot be change later - correct ?

Is there any way to get info on state (RocksDB) e.g number of keys , or list of keys ?

Regards

Avi

Andrey Zagrebin-3

Re: Production readiness

Hi Avi,

The maximum parallelism is not an easy parameter to change for a job, once the job is started.
The checkpoints/savepoints of the job will need migration to rehash the keyed state entries to the different number of key groups (unit of keyed state storage). You can try Bravo tool for it [1].

As for the number of keys, you can try enabling RocksDB Flink metrics [2], it is available since 1.7.

Best,

Andrey

[1] https://github.com/king/bravo
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#state-backend-rocksdb-metrics-estimate-num-keys

On Wed, Feb 13, 2019 at 4:58 PM Avi Levi <[hidden email]> wrote:

Hi
Looking at the production readiness checklist - is there any rule of thumb to determine the maximum parallelism ? we have a stateful pipeline with high throughput (4k requests/sec) running on google cloud (yarn) .
I understood that if we are not setting it the default setting is 128 but it can change in the future but if we set it, it cannot be change later - correct ?

Is there any way to get info on state (RocksDB) e.g number of keys , or list of keys ?

Regards
Avi

aitozi

Re: Production readiness

Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario，will the state lost?

Thanks,
Aitozi

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Till Rohrmann

Re: Production readiness

Hi Aitozi,

resuming a job with a higher parallelism than the initially defined max parallelism (in your case 128) is not possible. For this one would need to rewrite the savepoint information (basically rehash the keys) as Andrey said.

Cheers,

Till

On Thu, Feb 14, 2019 at 3:50 AM aitozi <[hidden email]> wrote:

Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario，will the state lost?

Thanks,
Aitozi

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Andrey Zagrebin-3

Re: Production readiness

In reply to this post by aitozi

Hi Aitozi,

Flink will check upon job start and fail if
- max parallelism > parallelism (KeyGroupRangeAssignment.computeKeyGroupRangeForOperatorIndex) or
- max parallelism of savepoint > max parallelism of restored job (Checkpoints.loadAndValidateCheckpoint).

Theoretically that would be possible without migration and state loss but with wasting increased resources which does not make sense.

Best,

Andrey

On Thu, Feb 14, 2019 at 3:50 AM aitozi <[hidden email]> wrote:

Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario，will the state lost?

Thanks,
Aitozi

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/