(DEPRECATED) Apache Flink User Mailing List archive.

NoResourceAvailable exception

Classic

List

Threaded

15 messages Options

AndreaKinn

NoResourceAvailable exception

Hi,
I'm executing a program on a flink cluster.
I tried the same on a local node with Eclipse and it worked fine.

To start, following Flink recommendations on the cluster I set
numberOfTaskSlots equals to the Cpu cores (2) while I set parallelism to 1.
Unfortunately when I try to execute I obtain

Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Not enough free slots available to run the job. You can decrease the
operator parallelism or increase the number of slots per TaskManager in the
configuration. Task to schedule: < Attempt #1 (Source: Custom Source ->
Timestamps/Watermarks (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < SlotSharingGroup
[e883208d19e3c34f8aaf2a3168a63337, 9dd63673dd41ea021b896d5203f3ba7c,
cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available to scheduler:
Number of instances=0, total number of slots=0, available slots=0

As you can see it says I have 0 available slots... how is this possible?!?
I set no chains or sharingGroups in the code.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: NoResourceAvailable exception

The error message says that the total number of slots is 0,
It is thus very likely that no task manager is connected to the jobmanager.

How exactly are you starting the cluster?

On 14.09.2017 18:03, AndreaKinn wrote:

> Hi,
> I'm executing a program on a flink cluster.
> I tried the same on a local node with Eclipse and it worked fine.
>
> To start, following Flink recommendations on the cluster I set
> numberOfTaskSlots equals to the Cpu cores (2) while I set parallelism to 1.
> Unfortunately when I try to execute I obtain
>
>
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Not enough free slots available to run the job. You can decrease the
> operator parallelism or increase the number of slots per TaskManager in the
> configuration. Task to schedule: < Attempt #1 (Source: Custom Source ->
> Timestamps/Watermarks (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
> cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < SlotSharingGroup
> [e883208d19e3c34f8aaf2a3168a63337, 9dd63673dd41ea021b896d5203f3ba7c,
> cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available to scheduler:
> Number of instances=0, total number of slots=0, available slots=0
>
>
> As you can see it says I have 0 available slots... how is this possible?!?
> I set no chains or sharingGroups in the code.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

AndreaKinn

Re: NoResourceAvailable exception

Update.
the previous error probably was caused because I didn't restart the cluster
before a re-execution. (maybe)

Then, I tried to execute the program on a cluster of one node on my laptop
and, after solved some little issues, everything works fine.

Now I'm trying to deploy the same jar on the real cluster. Initially
everything seems to work correctly.

giordano@giordano-2-2-100-1:~$ ./flink-1.3.2/bin/flink run
flink-java-project-0.1.jar
Cluster configuration: Standalone cluster with JobManager at
localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
Submitting job with JobID: 161d91dda7c7012c8f48fa8a104a1662. Waiting for job
completion.
Connected to JobManager at
Actor[akka.tcp://flink@localhost:6123/user/jobmanager#430534598] with leader
session id 00000000-0000-0000-0000-000000000000.
09/14/2017 22:05:00 Job execution switched to status RUNNING.
09/14/2017 22:05:00 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to SCHEDULED
09/14/2017 22:05:00 Map -> Sink: Unnamed(1/1) switched to SCHEDULED
09/14/2017 22:05:00 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to SCHEDULED
09/14/2017 22:05:00 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to DEPLOYING
09/14/2017 22:05:00 Map -> Sink: Unnamed(1/1) switched to DEPLOYING
09/14/2017 22:05:00 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to DEPLOYING
09/14/2017 22:05:01 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to RUNNING
09/14/2017 22:05:01 Map -> Sink: Unnamed(1/1) switched to RUNNING
09/14/2017 22:05:01 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to RUNNING

Unfortunately, after a minute about, the job fails:

09/14/2017 22:06:53 Map -> Sink: Unnamed(1/1) switched to FAILED
java.lang.Exception: TaskManager was lost/killed:
413eda6bf77223085c59e104680259bc @ giordano-2-2-100-1 (dataPort=36498)
at
org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at
org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at
org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at
org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at
akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
at akka.actor.ActorCell.invoke(ActorCell.scala:486)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09/14/2017 22:06:53 Job execution switched to status FAILING.
java.lang.Exception: TaskManager was lost/killed:
413eda6bf77223085c59e104680259bc @ giordano-2-2-100-1 (dataPort=36498)
at
org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at
org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at
org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at
org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at
akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
at akka.actor.ActorCell.invoke(ActorCell.scala:486)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
09/14/2017 22:06:53 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to CANCELING
09/14/2017 22:06:53 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to CANCELING
09/14/2017 22:06:53 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to CANCELED
09/14/2017 22:06:53 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to CANCELED

Then the job is restarted but shows again the NoResourceAvailable error.

I start the cluster using start-cluster.sh script and everything works fine
starting task managers also in the other node.
I set on every nodes number of task slots equal to core number (2) while
parallelism key is commented.
On the master node (it works as jobmanager and taskmanager) I set

jobmanager.heap.mb: 756
taskmanager.heap.mb:756
(I have 2GB of Ram on it)

while on the other two nodes:

taskmanager.heap.mb:1512
(I have 2GB of Ram on them)

Hints?

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

P.S.: I tried on my laptop with the same configuration of the job-task
manager (ram, slots, parallelism etc...) and it works perfectly.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Aljoscha Krettek

Re: NoResourceAvailable exception

Hi,

Can you check in the TaskManager logs whether there is any message that indicates why the TaskManager was lost? Also, there might be information in your machine logs, i.e. "dmesg" or /var/log/messages or some such.

Best,
Aljoscha

> On 14. Sep 2017, at 22:28, AndreaKinn <[hidden email]> wrote:
>
> P.S.: I tried on my laptop with the same configuration of the job-task
> manager (ram, slots, parallelism etc...) and it works perfectly.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

This is the log:

2017-09-15 12:47:49,143 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classe$
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
--------------------------------------------------------------------------------
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Starting
TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC)
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Current
user: giordano
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager - JVM: Java
HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.131-b11
2017-09-15 12:47:49,258 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Maximum
heap size: 502 MiBytes
2017-09-15 12:47:49,258 INFO
org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME:
/usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Hadoop
version: 2.7.2
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager - JVM
Options:
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
-XX:+UseG1GC
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
-Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-taskmanager-0-giordano$
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
-Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
-Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Program
Arguments:
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
--configDir
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
/home/giordano/flink-1.3.2/conf
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Classpath:
/home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giorda$
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager -
--------------------------------------------------------------------------------
2017-09-15 12:47:49,263 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Registered
UNIX signal handlers for [TERM, HUP, INT]
2017-09-15 12:47:49,269 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Maximum
number of open file descriptors is 65536
2017-09-15 12:47:49,295 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Loading
configuration from /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:49,298 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend.fs.checkpointdir, file:///home/flink-$
2017-09-15 12:47:49,311 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend.fs.checkpointdir, file:///home/flink-$
2017-09-15 12:47:49,386 INFO
org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user
set to giordano (auth:SIMPLE)
2017-09-15 12:47:49,463 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to
select the network interface and address to use by connecting to the lead$
2017-09-15 12:47:49,463 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager
will try to connect for 10000 milliseconds before falling back to heuri$
2017-09-15 12:47:49,466 INFO org.apache.flink.runtime.net.ConnectionUtils
- Retrieved new target address localhost/127.0.0.1:6123.
2017-09-15 12:47:49,477 INFO
org.apache.flink.runtime.taskmanager.TaskManager - TaskManager
will use hostname/address 'giordano-2-2-100-1' (192.168.11.56) for comm$
2017-09-15 12:47:49,478 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Starting
TaskManager
2017-09-15 12:47:49,479 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Starting
TaskManager actor system at giordano-2-2-100-1:0.
2017-09-15 12:47:49,989 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
2017-09-15 12:47:50,053 INFO Remoting
- Starting remoting
2017-09-15 12:47:50,290 INFO Remoting
- Remoting started; listening on addresses
:[akka.tcp://flink@giordano-2-2-100-1:3512$
2017-09-15 12:47:50,301 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Starting
TaskManager actor
2017-09-15 12:47:50,323 INFO
org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig
[server address: giordano-2-2-100-1/192.168.11.56, server port: 0, ssl $
2017-09-15 12:47:50,331 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages
have a max timeout of 10000 ms
2017-09-15 12:47:50,338 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary
file directory '/tmp': total 99 GB, usable 95 GB (95.96% usable)
2017-09-15 12:47:50,534 INFO
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 64
MB for network buffer pool (number of memory segments: 2048, bytes per$
2017-09-15 12:47:50,800 INFO
org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the
network environment and its components.
2017-09-15 12:47:50,816 INFO
org.apache.flink.runtime.io.network.netty.NettyClient - Successful
initialization (took 4 ms).
2017-09-15 12:47:53,827 WARN io.netty.util.internal.ThreadLocalRandom
- Failed to generate a seed from SecureRandom within 3 seconds. Not enough
entrophy?
2017-09-15 12:47:53,866 INFO
org.apache.flink.runtime.io.network.netty.NettyServer - Successful
initialization (took 3049 ms). Listening on SocketAddress /192.168.11.56$
2017-09-15 12:47:53,977 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting
managed memory to 0.7 of the currently free heap space (301 MB), memory wi$
2017-09-15 12:47:53,986 INFO
org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager
uses directory /tmp/flink-io-75aed96d-28e9-4bcb-8d9b-4de0e734890d for s$
2017-09-15 12:47:53,998 INFO
org.apache.flink.runtime.metrics.MetricRegistry - No metrics
reporter configured, no metrics will be exposed/reported.
2017-09-15 12:47:54,114 INFO org.apache.flink.runtime.filecache.FileCache
- User file cache uses directory
/tmp/flink-dist-cache-5d60853e-9225-4438-9b07-ce6db2$
2017-09-15 12:47:54,128 INFO org.apache.flink.runtime.filecache.FileCache
- User file cache uses directory
/tmp/flink-dist-cache-c00e60e1-3786-45ef-b1bc-570df5$
2017-09-15 12:47:54,140 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Starting
TaskManager actor at akka://flink/user/taskmanager#523808577.
2017-09-15 12:47:54,141 INFO
org.apache.flink.runtime.taskmanager.TaskManager - TaskManager
data connection information: cf04d1390ff86aba4d1702ef1a0d2b67 @ giordan$
2017-09-15 12:47:54,141 INFO
org.apache.flink.runtime.taskmanager.TaskManager - TaskManager
has 2 task slot(s).
2017-09-15 12:47:54,143 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Memory usage
stats: [HEAP: 74/197/502 MB, NON HEAP: 33/34/-1 MB (used/committed/max$
2017-09-15 12:47:54,148 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Trying to
register at JobManager akka.tcp://flink@localhost:6123/user/jobmanager (a$
2017-09-15 12:47:54,430 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Successful
registration at JobManager (akka.tcp://flink@localhost:6123/user/jobmana$
2017-09-15 12:47:54,440 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Determined
BLOB server address to be localhost/127.0.0.1:39682. Starting BLOB cache.
2017-09-15 12:47:54,448 INFO org.apache.flink.runtime.blob.BlobCache
- Created BLOB cache storage directory
/tmp/blobStore-1c075944-0152-42aa-b64a-607b931$
2017-09-15 12:48:02,066 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Received
task Source: Custom Source -> Timestamps/Watermarks (1/1)
2017-09-15 12:48:02,081 INFO org.apache.flink.runtime.taskmanager.Task
- Source: Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a95495$
2017-09-15 12:48:02,081 INFO org.apache.flink.runtime.taskmanager.Task
- Creating FileSystem stream leak safety net for task Source: Custom Source
-> Timest$
2017-09-15 12:48:02,085 INFO org.apache.flink.runtime.taskmanager.Task
- Loading JAR files for task Source: Custom Source -> Timestamps/Watermarks
(1/1) (bc$
2017-09-15 12:48:02,086 INFO org.apache.flink.runtime.blob.BlobCache
- Downloading 180bc9dfc19f185ab48acbc5bec0568e15a41665 from
localhost/127.0.0.1:39682
2017-09-15 12:48:02,088 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Received
task Map -> Sink: Unnamed (1/1)
2017-09-15 12:48:02,103 INFO
org.apache.flink.runtime.taskmanager.TaskManager - Received
task Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra $
2017-09-15 12:48:02,107 INFO org.apache.flink.runtime.taskmanager.Task
- Map -> Sink: Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched
from CREATED$

2017-09-15 12:48:02,106 INFO org.apache.flink.runtime.taskmanager.Task
- Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (d$
2017-09-15 12:48:02,115 INFO org.apache.flink.runtime.taskmanager.Task
- Creating FileSystem stream leak safety net for task Map -> Sink: Unnamed
(1/1) (7ec$
2017-09-15 12:48:02,116 INFO org.apache.flink.runtime.taskmanager.Task
- Loading JAR files for task Map -> Sink: Unnamed (1/1)
(7ecc9cea7b0132f9604bce99545b$
2017-09-15 12:48:02,166 INFO org.apache.flink.runtime.taskmanager.Task
- Creating FileSystem stream leak safety net for task Learn -> Select ->
Process -> ($
2017-09-15 12:48:02,171 INFO org.apache.flink.runtime.taskmanager.Task
- Loading JAR files for task Learn -> Select -> Process -> (Sink: Cassandra
Sink, Sin$
2017-09-15 12:48:02,988 INFO org.apache.flink.runtime.taskmanager.Task
- Registering task at network: Learn -> Select -> Process -> (Sink:
Cassandra Sink, S$
2017-09-15 12:48:02,988 INFO org.apache.flink.runtime.taskmanager.Task
- Registering task at network: Source: Custom Source ->
Timestamps/Watermarks (1/1) ($
2017-09-15 12:48:02,991 INFO org.apache.flink.runtime.taskmanager.Task
- Registering task at network: Map -> Sink: Unnamed (1/1)
(7ecc9cea7b0132f9604bce9954$
2017-09-15 12:48:02,997 INFO org.apache.flink.runtime.taskmanager.Task
- Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (d$
2017-09-15 12:48:03,005 INFO org.apache.flink.runtime.taskmanager.Task
- Source: Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a95495$
2017-09-15 12:48:03,008 INFO org.apache.flink.runtime.taskmanager.Task
- Map -> Sink: Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched
from DEPLOYI$
2017-09-15 12:48:03,078 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,078 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,082 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,427 INFO
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend - Initializing
heap keyed state backend with stream factory.
2017-09-15 12:48:03,439 WARN org.apache.flink.metrics.MetricGroup
- Name collision: Group already contains a Metric with the name 'latency'.
Metric wil$
2017-09-15 12:48:03,443 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - No
restore state for FlinkKafkaConsumer.
2017-09-15 12:48:03,519 INFO
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend - Initializing
heap keyed state backend with stream factory.
2017-09-15 12:48:03,545 INFO
org.apache.kafka.clients.consumer.ConsumerConfig -
ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [147.83.29.146:55091]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = groupId
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest

2017-09-15 12:48:03,642 INFO
org.apache.kafka.clients.consumer.ConsumerConfig -
ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [147.83.29.146:55091]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id = consumer-1
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = groupId
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest

2017-09-15 12:48:03,696 INFO com.datastax.driver.core.GuavaCompatibility
- Detected Guava < 19 in the classpath, using legacy compatibility layer
2017-09-15 12:48:03,716 INFO org.apache.kafka.common.utils.AppInfoParser
- Kafka version : 0.10.0.1
2017-09-15 12:48:03,718 INFO org.apache.kafka.common.utils.AppInfoParser
- Kafka commitId : a7a17cdec9eaa6c5
2017-09-15 12:48:03,998 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09 - Got 10
partitions from these topics: [LCacc]
2017-09-15 12:48:03,999 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09 - Consumer
is going to read the following topics (with number of partitions): LCa$
2017-09-15 12:48:04,099 INFO
org.apache.kafka.clients.consumer.ConsumerConfig -
ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [147.83.29.146:55091]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = groupId
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest

2017-09-15 12:48:04,111 INFO
org.apache.kafka.clients.consumer.ConsumerConfig -
ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [147.83.29.146:55091]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id = consumer-2
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = groupId
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest

2017-09-15 12:48:04,127 INFO org.apache.kafka.common.utils.AppInfoParser
- Kafka version : 0.10.0.1
2017-09-15 12:48:04,127 INFO org.apache.kafka.common.utils.AppInfoParser
- Kafka commitId : a7a17cdec9eaa6c5
2017-09-15 12:48:04,302 INFO
org.apache.kafka.clients.consumer.internals.AbstractCoordinator -
Discovered coordinator 147.83.29.146:55091 (id: 2147483647 rack: null) for
group$
2017-09-15 12:48:04,371 INFO com.datastax.driver.core.ClockFactory
- Using native clock to generate timestamps.
2017-09-15 12:48:04,445 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-0 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,476 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-1 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,509 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-2 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,540 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-3 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,565 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-4 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,596 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-5 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,619 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-6 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,655 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-7 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,688 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-8 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,720 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher -
Partition LCacc-9 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,896 INFO com.datastax.driver.core.NettyUtil
- Found Netty's native epoll transport in the classpath, using it
2017-09-15 12:48:05,557 INFO
com.datastax.driver.core.policies.DCAwareRoundRobinPolicy - Using
data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is
incorr
2017-09-15 12:48:05,559 INFO com.datastax.driver.core.Cluster
- New Cassandra host /147.83.29.146:55092 added
2017-09-15 12:48:05,681 INFO com.datastax.driver.core.ClockFactory
- Using native clock to generate timestamps.
2017-09-15 12:48:05,819 INFO
com.datastax.driver.core.policies.DCAwareRoundRobinPolicy - Using
data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is
incorrect, please provide the$
2017-09-15 12:48:05,819 INFO com.datastax.driver.core.Cluster
- New Cassandra host /147.83.29.146:55092 added
2017-09-15 12:48:10,630 INFO
org.numenta.nupic.flink.streaming.api.operator.AbstractHTMInferenceOperator
- Created HTM network DayDemoNetwork
2017-09-15 12:48:10,675 WARN org.numenta.nupic.network.Layer
- The number of Input Dimensions (1) != number of Column Dimensions (1)
--OR-- Encoder width (2350) != produ$
2017-09-15 12:48:10,678 INFO org.numenta.nupic.network.Layer
- Input dimension fix successful!
2017-09-15 12:48:10,678 INFO org.numenta.nupic.network.Layer
- Using calculated input dimensions: [2350]
2017-09-15 12:48:10,679 INFO org.numenta.nupic.network.Layer
- Classifying "value" input field with CLAClassifier

In /dmsg there is nothing to show.
After the start of the execution there are no described errors.
Anyway, in the time before it crashes the job is not executed really and cpu
is about 100% (verified with top command)

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

In reply to this post by AndreaKinn

the job manager log probably is more interesting:

2017-09-15 12:47:45,420 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
-
--------------------------------------------------------------------------------
2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @
10:23:11 UTC)
2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
- Current user: giordano
2017-09-15 12:47:45,651 INFO org.apache.flink.runtime.jobmanager.JobManager
- JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation -
1.8/25.131-b11
2017-09-15 12:47:45,652 INFO org.apache.flink.runtime.jobmanager.JobManager
- Maximum heap size: 491 MiBytes
2017-09-15 12:47:45,652 INFO org.apache.flink.runtime.jobmanager.JobManager
- JAVA_HOME: /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
- Hadoop version: 2.7.2
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
- JVM Options:
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
- -Xms512m
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
- -Xmx512m
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
-
-Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-giordano-2-2-100-1.log
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
-
-Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
-
-Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
- Program Arguments:
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
- --configDir
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
- /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
- --executionMode
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
- cluster
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
- Classpath:
/home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giordano/flink-1.3.2/lib/flin$
2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
-
--------------------------------------------------------------------------------
2017-09-15 12:47:45,661 INFO org.apache.flink.runtime.jobmanager.JobManager
- Registered UNIX signal handlers for [TERM, HUP, INT]
2017-09-15 12:47:45,947 INFO org.apache.flink.runtime.jobmanager.JobManager
- Loading configuration from /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:45,955 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:45,956 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:45,956 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend.fs.checkpointdir,
file:///home/flink-checkpoints
2017-09-15 12:47:45,970 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager without high-availability
2017-09-15 12:47:45,973 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager on localhost:6123 with execution mode CLUSTER
2017-09-15 12:47:45,993 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: state.backend.fs.checkpointdir,
file:///home/flink-checkpoints
2017-09-15 12:47:46,045 INFO
org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user
set to giordano (auth:SIMPLE)
2017-09-15 12:47:46,209 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager actor system reachable at localhost:6123
2017-09-15 12:47:46,878 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
2017-09-15 12:47:46,982 INFO Remoting
- Starting remoting
2017-09-15 12:47:47,392 INFO Remoting
- Remoting started; listening on addresses
:[akka.tcp://flink@localhost:6123]
2017-09-15 12:47:47,423 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager web frontend
2017-09-15 12:47:47,433 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
location of JobManager log file:
/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-gio$
2017-09-15 12:47:47,434 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
location of JobManager stdout file:
/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-$
2017-09-15 12:47:47,434 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using
directory /tmp/flink-web-f9816186-7918-4475-8359-c3cafb63559a for the web
interface files
2017-09-15 12:47:47,435 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using
directory /tmp/flink-web-75aba058-8587-4181-bf16-ee2e68d9b70c for web
frontend JAR file uploads
2017-09-15 12:47:47,853 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend
listening at 0:0:0:0:0:0:0:0:8081
2017-09-15 12:47:47,854 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager actor
2017-09-15 12:47:47,876 INFO org.apache.flink.runtime.blob.BlobServer
- Created BLOB server storage directory
/tmp/blobStore-9ad4e807-069e-4fb7-88b3-410fbdcb5eb0
2017-09-15 12:47:47,879 INFO org.apache.flink.runtime.blob.BlobServer
- Started BLOB server at 0.0.0.0:39682 - max concurrent requests: 50 - max
backlog: 1000
2017-09-15 12:47:47,896 INFO
org.apache.flink.runtime.metrics.MetricRegistry - No metrics
reporter configured, no metrics will be exposed/reported.
2017-09-15 12:47:47,906 INFO
org.apache.flink.runtime.jobmanager.MemoryArchivist - Started
memory archivist akka://flink/user/archive
2017-09-15 12:47:47,914 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager at akka.tcp://flink@localhost:6123/user/jobmanager.
2017-09-15 12:47:47,929 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting
with JobManager akka.tcp://flink@localhost:6123/user/jobmanager on port 8081
2017-09-15 12:47:47,930 INFO
org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader
reachable under
akka.tcp://flink@localhost:6123/user/jobmanager:00000000-0000-0000-0000-0000000$
2017-09-15 12:47:47,970 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- Trying to associate with JobManager leader
akka.tcp://flink@localhost:6123/user/jobmanag$
2017-09-15 12:47:48,010 INFO org.apache.flink.runtime.jobmanager.JobManager
- JobManager akka.tcp://flink@localhost:6123/user/jobmanager was granted
leadership with leader session ID S$
2017-09-15 12:47:48,034 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- Resource Manager associating with leading JobManager
Actor[akka://flink/user/jobmanager#$
2017-09-15 12:47:51,071 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:51,383 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:51,718 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:52,109 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:52,414 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:53,130 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:54,365 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- TaskManager cf04d1390ff86aba4d1702ef1a0d2b67 has started.
2017-09-15 12:47:54,368 INFO
org.apache.flink.runtime.instance.InstanceManager - Registered
TaskManager at giordano-2-2-100-1
(akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager) $
2017-09-15 12:47:54,434 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:55,150 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:58,455 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:59,172 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:01,650 INFO org.apache.flink.runtime.jobmanager.JobManager
- Submitting job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
2017-09-15 12:48:01,768 INFO org.apache.flink.runtime.jobmanager.JobManager
- Using restart strategy
FixedDelayRestartStrategy(maxNumberRestartAttempts=1,
delayBetweenRestartAttempts=0$
2017-09-15 12:48:01,782 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers
via failover strategy: full graph restart
2017-09-15 12:48:01,796 INFO org.apache.flink.runtime.jobmanager.JobManager
- Running initialization on master for job Flink Streaming Job
(df1f60ca168364759c69dbe078544346).
2017-09-15 12:48:01,796 INFO org.apache.flink.runtime.jobmanager.JobManager
- Successfully ran initialization on master in 0 ms.
2017-09-15 12:48:01,830 INFO org.apache.flink.runtime.jobmanager.JobManager
- State backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-checkpoints")
2017-09-15 12:48:01,843 INFO org.apache.flink.runtime.jobmanager.JobManager
- Scheduling job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
2017-09-15 12:48:01,844 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink
Streaming Job (df1f60ca168364759c69dbe078544346) switched from state CREATED
to RUNNING.
2017-09-15 12:48:01,861 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from CREA$
2017-09-15 12:48:01,882 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from CREATED to
SCHEDULED.
2017-09-15 12:48:01,883 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:01,895 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from SCHE$
2017-09-15 12:48:01,912 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
Source: Custom Source -> Timestamps/Watermarks (1/1) (attempt #0) to
giordano-2-2-100-1
2017-09-15 12:48:01,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from SCHEDULED to
DEPLOYING.
2017-09-15 12:48:01,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
Map -> Sink: Unnamed (1/1) (attempt #0) to giordano-2-2-100-1
2017-09-15 12:48:01,941 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:01,941 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (attempt #0) to$
2017-09-15 12:48:03,018 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:03,038 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from DEPL$
2017-09-15 12:48:03,046 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from DEPLOYING to
RUNNING.
2017-09-15 12:48:06,474 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:07,189 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:22,495 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
$flink@giordano-2-2-100-1:6123/]] arriving at
[akka.tcp://flink@giordano-2-2-100-1:6123] inbound addresses are
[akka.tcp://flink@localhost:6123]
2017-09-15 12:48:52,506 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:53,212 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:22,524 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:23,230 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:52,544 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:59,410 WARN akka.remote.RemoteWatcher
- Detected unreachable: [akka.tcp://flink@giordano-2-2-100-1:35127]
2017-09-15 12:49:59,445 INFO org.apache.flink.runtime.jobmanager.JobManager
- Task manager akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager
terminated.
2017-09-15 12:49:59,446 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from RUNN$
java.lang.Exception: TaskManager was lost/killed:
cf04d1390ff86aba4d1702ef1a0d2b67 @ giordano-2-2-100-1 (dataPort=36806)

it tag as unreachable the task manager (who reside on the same node...)

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Aljoscha Krettek

Re: NoResourceAvailable exception

I think it might be that the computation is to CPU heavy, which makes the TaskManager unresponsive to any JobManager messages and so the JobManager thinks that the TaskManager is lost.

@Till, do you have another idea about what could be going on?

> On 15. Sep 2017, at 13:52, AndreaKinn <[hidden email]> wrote:
>
> the job manager log probably is more interesting:
>
> 2017-09-15 12:47:45,420 WARN org.apache.hadoop.util.NativeCodeLoader
> - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
> -
> --------------------------------------------------------------------------------
> 2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @
> 10:23:11 UTC)
> 2017-09-15 12:47:45,650 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Current user: giordano
> 2017-09-15 12:47:45,651 INFO org.apache.flink.runtime.jobmanager.JobManager
> - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation -
> 1.8/25.131-b11
> 2017-09-15 12:47:45,652 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Maximum heap size: 491 MiBytes
> 2017-09-15 12:47:45,652 INFO org.apache.flink.runtime.jobmanager.JobManager
> - JAVA_HOME: /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Hadoop version: 2.7.2
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> - JVM Options:
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> - -Xms512m
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> - -Xmx512m
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> -
> -Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-giordano-2-2-100-1.log
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> -
> -Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> -
> -Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
> 2017-09-15 12:47:45,658 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Program Arguments:
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> - --configDir
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> - /home/giordano/flink-1.3.2/conf
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> - --executionMode
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> - cluster
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Classpath:
> /home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giordano/flink-1.3.2/lib/flin$
> 2017-09-15 12:47:45,659 INFO org.apache.flink.runtime.jobmanager.JobManager
> -
> --------------------------------------------------------------------------------
> 2017-09-15 12:47:45,661 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Registered UNIX signal handlers for [TERM, HUP, INT]
> 2017-09-15 12:47:45,947 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Loading configuration from /home/giordano/flink-1.3.2/conf
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.heap.mb, 512
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: taskmanager.numberOfTaskSlots, 2
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-09-15 12:47:45,955 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-09-15 12:47:45,956 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: state.backend, filesystem
> 2017-09-15 12:47:45,956 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: state.backend.fs.checkpointdir,
> file:///home/flink-checkpoints
> 2017-09-15 12:47:45,970 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager without high-availability
> 2017-09-15 12:47:45,973 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager on localhost:6123 with execution mode CLUSTER
> 2017-09-15 12:47:45,993 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.heap.mb, 512
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: taskmanager.numberOfTaskSlots, 2
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: state.backend, filesystem
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading
> configuration property: state.backend.fs.checkpointdir,
> file:///home/flink-checkpoints
> 2017-09-15 12:47:46,045 INFO
> org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user
> set to giordano (auth:SIMPLE)
> 2017-09-15 12:47:46,209 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager actor system reachable at localhost:6123
> 2017-09-15 12:47:46,878 INFO akka.event.slf4j.Slf4jLogger
> - Slf4jLogger started
> 2017-09-15 12:47:46,982 INFO Remoting
> - Starting remoting
> 2017-09-15 12:47:47,392 INFO Remoting
> - Remoting started; listening on addresses
> :[akka.tcp://flink@localhost:6123]
> 2017-09-15 12:47:47,423 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager web frontend
> 2017-09-15 12:47:47,433 INFO
> org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
> location of JobManager log file:
> /home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-gio$
> 2017-09-15 12:47:47,434 INFO
> org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
> location of JobManager stdout file:
> /home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-$
> 2017-09-15 12:47:47,434 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using
> directory /tmp/flink-web-f9816186-7918-4475-8359-c3cafb63559a for the web
> interface files
> 2017-09-15 12:47:47,435 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using
> directory /tmp/flink-web-75aba058-8587-4181-bf16-ee2e68d9b70c for web
> frontend JAR file uploads
> 2017-09-15 12:47:47,853 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend
> listening at 0:0:0:0:0:0:0:0:8081
> 2017-09-15 12:47:47,854 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager actor
> 2017-09-15 12:47:47,876 INFO org.apache.flink.runtime.blob.BlobServer
> - Created BLOB server storage directory
> /tmp/blobStore-9ad4e807-069e-4fb7-88b3-410fbdcb5eb0
> 2017-09-15 12:47:47,879 INFO org.apache.flink.runtime.blob.BlobServer
> - Started BLOB server at 0.0.0.0:39682 - max concurrent requests: 50 - max
> backlog: 1000
> 2017-09-15 12:47:47,896 INFO
> org.apache.flink.runtime.metrics.MetricRegistry - No metrics
> reporter configured, no metrics will be exposed/reported.
> 2017-09-15 12:47:47,906 INFO
> org.apache.flink.runtime.jobmanager.MemoryArchivist - Started
> memory archivist akka://flink/user/archive
> 2017-09-15 12:47:47,914 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager at akka.tcp://flink@localhost:6123/user/jobmanager.
> 2017-09-15 12:47:47,929 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting
> with JobManager akka.tcp://flink@localhost:6123/user/jobmanager on port 8081
> 2017-09-15 12:47:47,930 INFO
> org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader
> reachable under
> akka.tcp://flink@localhost:6123/user/jobmanager:00000000-0000-0000-0000-0000000$
> 2017-09-15 12:47:47,970 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - Trying to associate with JobManager leader
> akka.tcp://flink@localhost:6123/user/jobmanag$
> 2017-09-15 12:47:48,010 INFO org.apache.flink.runtime.jobmanager.JobManager
> - JobManager akka.tcp://flink@localhost:6123/user/jobmanager was granted
> leadership with leader session ID S$
> 2017-09-15 12:47:48,034 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - Resource Manager associating with leading JobManager
> Actor[akka://flink/user/jobmanager#$
> 2017-09-15 12:47:51,071 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:51,383 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:51,718 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:52,109 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:52,414 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:53,130 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:54,365 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - TaskManager cf04d1390ff86aba4d1702ef1a0d2b67 has started.
> 2017-09-15 12:47:54,368 INFO
> org.apache.flink.runtime.instance.InstanceManager - Registered
> TaskManager at giordano-2-2-100-1
> (akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager) $
> 2017-09-15 12:47:54,434 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:55,150 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:58,455 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:59,172 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:01,650 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Submitting job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
> 2017-09-15 12:48:01,768 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Using restart strategy
> FixedDelayRestartStrategy(maxNumberRestartAttempts=1,
> delayBetweenRestartAttempts=0$
> 2017-09-15 12:48:01,782 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers
> via failover strategy: full graph restart
> 2017-09-15 12:48:01,796 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Running initialization on master for job Flink Streaming Job
> (df1f60ca168364759c69dbe078544346).
> 2017-09-15 12:48:01,796 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Successfully ran initialization on master in 0 ms.
> 2017-09-15 12:48:01,830 INFO org.apache.flink.runtime.jobmanager.JobManager
> - State backend is set to heap memory (checkpoints to filesystem
> "file:/home/flink-checkpoints")
> 2017-09-15 12:48:01,843 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Scheduling job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
> 2017-09-15 12:48:01,844 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink
> Streaming Job (df1f60ca168364759c69dbe078544346) switched from state CREATED
> to RUNNING.
> 2017-09-15 12:48:01,861 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from CREA$
> 2017-09-15 12:48:01,882 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from CREATED to
> SCHEDULED.
> 2017-09-15 12:48:01,883 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:01,895 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from SCHE$
> 2017-09-15 12:48:01,912 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
> Source: Custom Source -> Timestamps/Watermarks (1/1) (attempt #0) to
> giordano-2-2-100-1
> 2017-09-15 12:48:01,933 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from SCHEDULED to
> DEPLOYING.
> 2017-09-15 12:48:01,933 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
> Map -> Sink: Unnamed (1/1) (attempt #0) to giordano-2-2-100-1
> 2017-09-15 12:48:01,941 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:01,941 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
> Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
> (1/1) (attempt #0) to$
> 2017-09-15 12:48:03,018 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:03,038 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from DEPL$
> 2017-09-15 12:48:03,046 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from DEPLOYING to
> RUNNING.
> 2017-09-15 12:48:06,474 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:07,189 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:22,495 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> $flink@giordano-2-2-100-1:6123/]] arriving at
> [akka.tcp://flink@giordano-2-2-100-1:6123] inbound addresses are
> [akka.tcp://flink@localhost:6123]
> 2017-09-15 12:48:52,506 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:53,212 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:22,524 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:23,230 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:52,544 ERROR akka.remote.EndpointWriter
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:59,410 WARN akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@giordano-2-2-100-1:35127]
> 2017-09-15 12:49:59,445 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Task manager akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager
> terminated.
> 2017-09-15 12:49:59,446 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from RUNN$
> java.lang.Exception: TaskManager was lost/killed:
> cf04d1390ff86aba4d1702ef1a0d2b67 @ giordano-2-2-100-1 (dataPort=36806)
>
> it tag as unreachable the task manager (who reside on the same node...)
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

This post was updated on .

Moreover, the cpu usage of the other task manager is about 0%.
It seems the problem reside on the job manager which it is not able to distribute the work through the cluster.

AndreaKinn

Re: NoResourceAvailable exception

I tried also to set the only job manager on the first node and reconfiguring
the cluster admitting just two task manager. In this way I obtain
immediately a NoResourceAvailable error

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

Update:

Following other discussions I even tried to reduce memory.fraction to 10%
without success.
How can I set G1 as garbage collector?
the key is env.java.opts but the value?

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Jürgen Thomann

Re: NoResourceAvailable exception

Hi,

You can set it to G1GC with the following setting. In my example it is
only for the taskmanager, but env.java.opts should work in the same way.

env.java.opts.taskmanager: -XX:+UseG1GC

AndreaKinn

Re: NoResourceAvailable exception

Thank you, unfortunately it had no effects.

As I add more load on the computation appears the error taskmanager killed
on the node on use, without calling other nodes to sustain the computation.
I also increased

akka.watch.heartbeat.interval
akka.watch.heartbeat.pause
akka.transport.heartbeat.interval
akka.transport.heartbeat.pause

obtaining just a (very ) delayed error.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Aljoscha Krettek

Re: NoResourceAvailable exception

Btw, what load are you putting on the cluster, i.e. what is your computation? If you don't have load, the cluster and job just keep on running, right?

Best,
Aljoscha

> On 19. Sep 2017, at 12:00, AndreaKinn <[hidden email]> wrote:
>
> Thank you, unfortunately it had no effects.
>
> As I add more load on the computation appears the error taskmanager killed
> on the node on use, without calling other nodes to sustain the computation.
> I also increased
>
> akka.watch.heartbeat.interval
> akka.watch.heartbeat.pause
> akka.transport.heartbeat.interval
> akka.transport.heartbeat.pause
>
> obtaining just a (very ) delayed error.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

AndreaKinn

Re: NoResourceAvailable exception

the program is composed by:

6 Kafka /source/ connector with custom timestamp and watermark /extractor/
and /map/ function each.
then I use 6 instance of an external library called flink-htm (quite heavy)
moreover I have 6 /process/ method and 2 /union/ method to merge result
streams.
Finally I have 2 Cassandra /sinks/.

Data which arriving to kafka are 1 kb strings about each 20ms.

I'm absolutely sure that the flink-htm library is heavy but I hoped flink
managed them distributing the load (which are independent) through the
cluster... instead it seems like just one node suffers all the load
crashing.

If can help I can share my code.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/