Production readiness

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Production readiness

avilevi
Hi
Looking at the production readiness checklist - is there any rule of thumb to determine the maximum parallelism ? we have a stateful pipeline with high throughput (4k requests/sec) running on google cloud (yarn) .
I understood that if we are not setting it the default setting is 128 but it can change in the future but if we set it, it cannot be change later - correct ?

Is there any way to get info on state (RocksDB) e.g number of keys , or list of keys ? 

Regards 
Avi
Reply | Threaded
Open this post in threaded view
|

Re: Production readiness

Andrey Zagrebin-3
Hi Avi,

The maximum parallelism is not an easy parameter to change for a job, once the job is started.
The checkpoints/savepoints of the job will need migration to rehash the keyed state entries to the different number of key groups (unit of keyed state storage). You can try Bravo tool for it [1].

As for the number of keys, you can try enabling RocksDB Flink metrics [2], it is available since 1.7.

Best,

On Wed, Feb 13, 2019 at 4:58 PM Avi Levi <[hidden email]> wrote:
Hi
Looking at the production readiness checklist - is there any rule of thumb to determine the maximum parallelism ? we have a stateful pipeline with high throughput (4k requests/sec) running on google cloud (yarn) .
I understood that if we are not setting it the default setting is 128 but it can change in the future but if we set it, it cannot be change later - correct ?

Is there any way to get info on state (RocksDB) e.g number of keys , or list of keys ? 

Regards 
Avi
Reply | Threaded
Open this post in threaded view
|

Re: Production readiness

aitozi
Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario,will the state lost?

Thanks,
Aitozi



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Production readiness

Till Rohrmann
Hi Aitozi,

resuming a job with a higher parallelism than the initially defined max parallelism (in your case 128) is not possible. For this one would need to rewrite the savepoint information (basically rehash the keys) as Andrey said.

Cheers,
Till

On Thu, Feb 14, 2019 at 3:50 AM aitozi <[hidden email]> wrote:
Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario,will the state lost?

Thanks,
Aitozi



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Production readiness

Andrey Zagrebin-3
In reply to this post by aitozi
Hi Aitozi,

Flink will check upon job start and fail if
- max parallelism > parallelism (KeyGroupRangeAssignment.computeKeyGroupRangeForOperatorIndex) or
- max parallelism of savepoint > max parallelism of restored job (Checkpoints.loadAndValidateCheckpoint).

Theoretically that would be possible without migration and state loss but with wasting increased resources which does not make sense.

Best,
Andrey

On Thu, Feb 14, 2019 at 3:50 AM aitozi <[hidden email]> wrote:
Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario,will the state lost?

Thanks,
Aitozi



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/