I know this seems a silly question but I am trying to figure out optimal set
up for our flink jobs.
We are using standalone cluster with 5 jobs. Each job has 3 asynch operators
with Executors with thread counts of 20,20,100. Source is kafka and
cassandra and rest sinks exist.
Currently we are using parallelism = 1. So at max load a single job spans
at least 140 threads. Also we are using netty based libraries for cassandra
and restcalls . (As I can see in thread dump flink also uses netty server).
What we see is that total thread count adds up to ~ 500 for a single job.
Suddenly all jobs began to faıl ın production and we saw that it was mainly
due to ulimit user process. All jobs started in one server in cluster ( I do
not know why, as it is a cluster with 3 members)
It was set to around 1500 in that server. We then set a higher value and
problems seem to go away.
Can you recommend an optional prod setting for standalone cluster? Or should
there be a max limit on threads spawned by a single job?
Regards
--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/