Hi all,
In trying out different settings for performance, I run into a job failure case that puzzles me. I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran successfully, on a cluster with 40 slots. I then tried with -p 15, and it failed with: NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism… But the change was to reduce parallelism - why would that now cause this problem? Thanks, — Ken -------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
Hi, is this a streaming or batch job? If it is a batch job, are you using either collect() or print() on a DataSet? Cheers, Aljoscha On Thu, 28 Apr 2016 at 00:52 Ken Krugler <[hidden email]> wrote:
|
In reply to this post by Ken Krugler
Hey Ken!
That should not happen. Can you check the web interface for two things: - How many available slots are advertized on the landing page (localhost:8081) when you submit your job? - Can you check the actual parallelism of the submitted job (it should appear as a FAILED job in the web frontend). Is it really 15? – Ufuk On Thu, Apr 28, 2016 at 12:52 AM, Ken Krugler <[hidden email]> wrote: > Hi all, > > In trying out different settings for performance, I run into a job failure > case that puzzles me. > > I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran > successfully, on a cluster with 40 slots. > > I then tried with -p 15, and it failed with: > > NoResourceAvailableException: Not enough free slots available to run the > job. You can decrease the operator parallelism… > > But the change was to reduce parallelism - why would that now cause this > problem? > > Thanks, > > — Ken > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > |
In reply to this post by Aljoscha Krettek
— Ken
-------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
In reply to this post by Ufuk Celebi
Hi Ufuk,
I’m running this on YARN, so I don’t believe the web UI shows up until the Flink AppManager has been started, which means I don’t know the advertised number of available slots before the job is running.
Same as above, the Flink web UI is gone once the job has failed. Any suggestions for how to check the actual parallelism in this type of transient YARN environment? Thanks, — Ken
-------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
Free forum by Nabble | Edit this page |