Flink Standalone cluster - production settings

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Standalone cluster - production settings

simpleusr
I know this seems a silly question but I am trying to figure out optimal set
up for our flink jobs.
We are using standalone cluster with 5 jobs. Each job has 3 asynch operators
with Executors with thread counts of 20,20,100. Source is kafka and
cassandra and rest sinks exist.
Currently we are using parallelism = 1.  So, at max load a single job spans
at least 140 threads. Also we are using netty based libraries for cassandra
and restcalls . (As I can see in thread dump flink also uses netty server).

What we see is that total thread count adds up to ~ 500 for a single job.

The issue we faced is, all of a sudden all jobs began to fail in production
and we saw that it was mainly due to ulimit user process. All jobs did
started in one server in cluster ( I do not know why, as it is a cluster
with 3 members).

It was set to around 1500 in that server. We then set a higher value and
problems seem to go away.

Can you recommend an optional prod setting for standalone cluster? Or should
there be a max limit on threads spawned by a single job?

Regards



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink Standalone cluster - production settings

Hung
/ Each job has 3 asynch operators
with Executors with thread counts of 20,20,100/

Flink handles parallelisms for you. If you want a higher parallelism of a
operator, you can call setParallelism()
for example,

flatMap(new Mapper1()).setParallelism(20)
flatMap(new Mapper2()).setParallelism(20)
flatMap(new Mapper3()).setParallelism(100)

You can check the official document here
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/parallel.html#setting-the-parallelism

/Currently we are using parallelism = 1/
I guess you set the job level parallelism

I would suggest you replace Executors with the use of Flink parallelisms. It
would be more efficient so
you don't create the other thread pool although you already have one that
flink provides you(I maybe not right describing this concept)

Cheers,

Sendoh





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink Standalone cluster - production settings

Padarn Wilson-2
Are you able to give some detail on in which cases you might be better off setting higher (or lower) parallelism for an operator?

On Thu, Feb 21, 2019 at 9:54 PM Hung <[hidden email]> wrote:
/ Each job has 3 asynch operators
with Executors with thread counts of 20,20,100/

Flink handles parallelisms for you. If you want a higher parallelism of a
operator, you can call setParallelism()
for example,

flatMap(new Mapper1()).setParallelism(20)
flatMap(new Mapper2()).setParallelism(20)
flatMap(new Mapper3()).setParallelism(100)

You can check the official document here
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/parallel.html#setting-the-parallelism

/Currently we are using parallelism = 1/
I guess you set the job level parallelism

I would suggest you replace Executors with the use of Flink parallelisms. It
would be more efficient so
you don't create the other thread pool although you already have one that
flink provides you(I maybe not right describing this concept)

Cheers,

Sendoh





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/