NoResourceAvailable exception

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

NoResourceAvailable exception

AndreaKinn
Hi,
I'm executing a program on a flink cluster.
I tried the same on a local node with Eclipse and it worked fine.

To start, following Flink recommendations on the cluster I set
numberOfTaskSlots equals to the Cpu cores (2) while I set parallelism to 1.
Unfortunately when I try to execute I obtain


Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Not enough free slots available to run the job. You can decrease the
operator parallelism or increase the number of slots per TaskManager in the
configuration. Task to schedule: < Attempt #1 (Source: Custom Source ->
Timestamps/Watermarks (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < SlotSharingGroup
[e883208d19e3c34f8aaf2a3168a63337, 9dd63673dd41ea021b896d5203f3ba7c,
cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available to scheduler:
Number of instances=0, total number of slots=0, available slots=0

       
As you can see it says I have 0 available slots... how is this possible?!?
I set no chains or sharingGroups in the code.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

Chesnay Schepler
The error message says that the total number of slots is 0,
It is thus very likely that no task manager is connected to the jobmanager.

How exactly are you starting the cluster?

On 14.09.2017 18:03, AndreaKinn wrote:

> Hi,
> I'm executing a program on a flink cluster.
> I tried the same on a local node with Eclipse and it worked fine.
>
> To start, following Flink recommendations on the cluster I set
> numberOfTaskSlots equals to the Cpu cores (2) while I set parallelism to 1.
> Unfortunately when I try to execute I obtain
>
>
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Not enough free slots available to run the job. You can decrease the
> operator parallelism or increase the number of slots per TaskManager in the
> configuration. Task to schedule: < Attempt #1 (Source: Custom Source ->
> Timestamps/Watermarks (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
> cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < SlotSharingGroup
> [e883208d19e3c34f8aaf2a3168a63337, 9dd63673dd41ea021b896d5203f3ba7c,
> cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available to scheduler:
> Number of instances=0, total number of slots=0, available slots=0
>
>
> As you can see it says I have 0 available slots... how is this possible?!?
> I set no chains or sharingGroups in the code.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
Update.
the previous error probably was caused because I didn't restart the cluster
before a re-execution. (maybe)

Then, I tried to execute the program on a cluster of one node on my laptop
and, after solved some little issues, everything works fine.

Now I'm trying to deploy the same jar on the real cluster. Initially
everything seems to work correctly.

giordano@giordano-2-2-100-1:~$ ./flink-1.3.2/bin/flink run
flink-java-project-0.1.jar
Cluster configuration: Standalone cluster with JobManager at
localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
Submitting job with JobID: 161d91dda7c7012c8f48fa8a104a1662. Waiting for job
completion.
Connected to JobManager at
Actor[akka.tcp://flink@localhost:6123/user/jobmanager#430534598] with leader
session id 00000000-0000-0000-0000-000000000000.
09/14/2017 22:05:00 Job execution switched to status RUNNING.
09/14/2017 22:05:00 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to SCHEDULED
09/14/2017 22:05:00 Map -> Sink: Unnamed(1/1) switched to SCHEDULED
09/14/2017 22:05:00 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to SCHEDULED
09/14/2017 22:05:00 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to DEPLOYING
09/14/2017 22:05:00 Map -> Sink: Unnamed(1/1) switched to DEPLOYING
09/14/2017 22:05:00 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to DEPLOYING
09/14/2017 22:05:01 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to RUNNING
09/14/2017 22:05:01 Map -> Sink: Unnamed(1/1) switched to RUNNING
09/14/2017 22:05:01 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to RUNNING


Unfortunately, after a minute about, the job fails:


09/14/2017 22:06:53 Map -> Sink: Unnamed(1/1) switched to FAILED
java.lang.Exception: TaskManager was lost/killed:
413eda6bf77223085c59e104680259bc @ giordano-2-2-100-1 (dataPort=36498)
        at
org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
        at
org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
        at
org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
        at
org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
        at
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
        at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
        at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
        at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at
akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09/14/2017 22:06:53 Job execution switched to status FAILING.
java.lang.Exception: TaskManager was lost/killed:
413eda6bf77223085c59e104680259bc @ giordano-2-2-100-1 (dataPort=36498)
        at
org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
        at
org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
        at
org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
        at
org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
        at
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
        at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
        at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
        at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at
akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
09/14/2017 22:06:53 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to CANCELING
09/14/2017 22:06:53 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to CANCELING
09/14/2017 22:06:53 Source: Custom Source -> Timestamps/Watermarks(1/1)
switched to CANCELED
09/14/2017 22:06:53 Learn -> Select -> Process -> (Sink: Cassandra Sink,
Sink: Cassandra Sink)(1/1) switched to CANCELED

Then the job is restarted but shows again the NoResourceAvailable error.


I start the cluster using start-cluster.sh script and everything works fine
starting task managers also in the other node.
I set on every nodes number of task slots equal to core number (2) while
parallelism key is commented.
On the master node (it works as jobmanager and taskmanager) I set

jobmanager.heap.mb: 756
taskmanager.heap.mb:756
(I have 2GB of Ram on it)

while on the other two nodes:

taskmanager.heap.mb:1512
(I have 2GB of Ram on them)

Hints?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
P.S.: I tried on my laptop with the same configuration of the job-task
manager (ram, slots, parallelism etc...) and it works perfectly.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

Aljoscha Krettek
Hi,

Can you check in the TaskManager logs whether there is any message that indicates why the TaskManager was lost? Also, there might be information in your machine logs, i.e. "dmesg" or /var/log/messages or some such.

Best,
Aljoscha

> On 14. Sep 2017, at 22:28, AndreaKinn <[hidden email]> wrote:
>
> P.S.: I tried on my laptop with the same configuration of the job-task
> manager (ram, slots, parallelism etc...) and it works perfectly.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
This is the log:

2017-09-15 12:47:49,143 WARN  org.apache.hadoop.util.NativeCodeLoader                      
- Unable to load native-hadoop library for your platform... using
builtin-java classe$
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
--------------------------------------------------------------------------------
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Starting
TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC)
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Current
user: giordano
2017-09-15 12:47:49,257 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JVM: Java
HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.131-b11
2017-09-15 12:47:49,258 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Maximum
heap size: 502 MiBytes
2017-09-15 12:47:49,258 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JAVA_HOME:
/usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Hadoop
version: 2.7.2
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JVM
Options:
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
-XX:+UseG1GC
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
-Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-taskmanager-0-giordano$
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
-Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
2017-09-15 12:47:49,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
-Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Program
Arguments:
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
--configDir
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -    
/home/giordano/flink-1.3.2/conf
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Classpath:
/home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giorda$
2017-09-15 12:47:49,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
--------------------------------------------------------------------------------
2017-09-15 12:47:49,263 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Registered
UNIX signal handlers for [TERM, HUP, INT]
2017-09-15 12:47:49,269 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Maximum
number of open file descriptors is 65536
2017-09-15 12:47:49,295 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Loading
configuration from /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:49,298 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:49,299 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:49,300 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend.fs.checkpointdir, file:///home/flink-$
2017-09-15 12:47:49,311 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:49,312 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:49,313 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend.fs.checkpointdir, file:///home/flink-$
2017-09-15 12:47:49,386 INFO
org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
set to giordano (auth:SIMPLE)
2017-09-15 12:47:49,463 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to
select the network interface and address to use by connecting to the lead$
2017-09-15 12:47:49,463 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager
will try to connect for 10000 milliseconds before falling back to heuri$
2017-09-15 12:47:49,466 INFO  org.apache.flink.runtime.net.ConnectionUtils                
- Retrieved new target address localhost/127.0.0.1:6123.
2017-09-15 12:47:49,477 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
will use hostname/address 'giordano-2-2-100-1' (192.168.11.56) for comm$
2017-09-15 12:47:49,478 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager
2017-09-15 12:47:49,479 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor system at giordano-2-2-100-1:0.
2017-09-15 12:47:49,989 INFO  akka.event.slf4j.Slf4jLogger                                
- Slf4jLogger started
2017-09-15 12:47:50,053 INFO  Remoting                                                    
- Starting remoting
2017-09-15 12:47:50,290 INFO  Remoting                                                    
- Remoting started; listening on addresses
:[akka.tcp://flink@giordano-2-2-100-1:3512$
2017-09-15 12:47:50,301 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor
2017-09-15 12:47:50,323 INFO
org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig
[server address: giordano-2-2-100-1/192.168.11.56, server port: 0, ssl $
2017-09-15 12:47:50,331 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages
have a max timeout of 10000 ms
2017-09-15 12:47:50,338 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary
file directory '/tmp': total 99 GB, usable 95 GB (95.96% usable)
2017-09-15 12:47:50,534 INFO
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 64
MB for network buffer pool (number of memory segments: 2048, bytes per$
2017-09-15 12:47:50,800 INFO
org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting the
network environment and its components.
2017-09-15 12:47:50,816 INFO
org.apache.flink.runtime.io.network.netty.NettyClient         - Successful
initialization (took 4 ms).
2017-09-15 12:47:53,827 WARN  io.netty.util.internal.ThreadLocalRandom                    
- Failed to generate a seed from SecureRandom within 3 seconds. Not enough
entrophy?
2017-09-15 12:47:53,866 INFO
org.apache.flink.runtime.io.network.netty.NettyServer         - Successful
initialization (took 3049 ms). Listening on SocketAddress /192.168.11.56$
2017-09-15 12:47:53,977 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting
managed memory to 0.7 of the currently free heap space (301 MB), memory wi$
2017-09-15 12:47:53,986 INFO
org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager
uses directory /tmp/flink-io-75aed96d-28e9-4bcb-8d9b-4de0e734890d for s$
2017-09-15 12:47:53,998 INFO
org.apache.flink.runtime.metrics.MetricRegistry               - No metrics
reporter configured, no metrics will be exposed/reported.
2017-09-15 12:47:54,114 INFO  org.apache.flink.runtime.filecache.FileCache                
- User file cache uses directory
/tmp/flink-dist-cache-5d60853e-9225-4438-9b07-ce6db2$
2017-09-15 12:47:54,128 INFO  org.apache.flink.runtime.filecache.FileCache                
- User file cache uses directory
/tmp/flink-dist-cache-c00e60e1-3786-45ef-b1bc-570df5$
2017-09-15 12:47:54,140 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor at akka://flink/user/taskmanager#523808577.
2017-09-15 12:47:54,141 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
data connection information: cf04d1390ff86aba4d1702ef1a0d2b67 @ giordan$
2017-09-15 12:47:54,141 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
has 2 task slot(s).
2017-09-15 12:47:54,143 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Memory usage
stats: [HEAP: 74/197/502 MB, NON HEAP: 33/34/-1 MB (used/committed/max$
2017-09-15 12:47:54,148 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://flink@localhost:6123/user/jobmanager (a$
2017-09-15 12:47:54,430 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Successful
registration at JobManager (akka.tcp://flink@localhost:6123/user/jobmana$
2017-09-15 12:47:54,440 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Determined
BLOB server address to be localhost/127.0.0.1:39682. Starting BLOB cache.
2017-09-15 12:47:54,448 INFO  org.apache.flink.runtime.blob.BlobCache                      
- Created BLOB cache storage directory
/tmp/blobStore-1c075944-0152-42aa-b64a-607b931$
2017-09-15 12:48:02,066 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Received
task Source: Custom Source -> Timestamps/Watermarks (1/1)
2017-09-15 12:48:02,081 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Source: Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a95495$
2017-09-15 12:48:02,081 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Creating FileSystem stream leak safety net for task Source: Custom Source
-> Timest$
2017-09-15 12:48:02,085 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Loading JAR files for task Source: Custom Source -> Timestamps/Watermarks
(1/1) (bc$
2017-09-15 12:48:02,086 INFO  org.apache.flink.runtime.blob.BlobCache                      
- Downloading 180bc9dfc19f185ab48acbc5bec0568e15a41665 from
localhost/127.0.0.1:39682
2017-09-15 12:48:02,088 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Received
task Map -> Sink: Unnamed (1/1)
2017-09-15 12:48:02,103 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Received
task Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra $
2017-09-15 12:48:02,107 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Map -> Sink: Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched
from CREATED$

2017-09-15 12:48:02,106 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (d$
2017-09-15 12:48:02,115 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Creating FileSystem stream leak safety net for task Map -> Sink: Unnamed
(1/1) (7ec$
2017-09-15 12:48:02,116 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Loading JAR files for task Map -> Sink: Unnamed (1/1)
(7ecc9cea7b0132f9604bce99545b$
2017-09-15 12:48:02,166 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Creating FileSystem stream leak safety net for task Learn -> Select ->
Process -> ($
2017-09-15 12:48:02,171 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Loading JAR files for task Learn -> Select -> Process -> (Sink: Cassandra
Sink, Sin$
2017-09-15 12:48:02,988 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Registering task at network: Learn -> Select -> Process -> (Sink:
Cassandra Sink, S$
2017-09-15 12:48:02,988 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Registering task at network: Source: Custom Source ->
Timestamps/Watermarks (1/1) ($
2017-09-15 12:48:02,991 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Registering task at network: Map -> Sink: Unnamed (1/1)
(7ecc9cea7b0132f9604bce9954$
2017-09-15 12:48:02,997 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (d$
2017-09-15 12:48:03,005 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Source: Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a95495$
2017-09-15 12:48:03,008 INFO  org.apache.flink.runtime.taskmanager.Task                    
- Map -> Sink: Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched
from DEPLOYI$
2017-09-15 12:48:03,078 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask           - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,078 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask           - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,082 INFO
org.apache.flink.streaming.runtime.tasks.StreamTask           - State
backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-ch$
2017-09-15 12:48:03,427 INFO
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     - Initializing
heap keyed state backend with stream factory.
2017-09-15 12:48:03,439 WARN  org.apache.flink.metrics.MetricGroup                        
- Name collision: Group already contains a Metric with the name 'latency'.
Metric wil$
2017-09-15 12:48:03,443 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - No
restore state for FlinkKafkaConsumer.
2017-09-15 12:48:03,519 INFO
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     - Initializing
heap keyed state backend with stream factory.
2017-09-15 12:48:03,545 INFO
org.apache.kafka.clients.consumer.ConsumerConfig              -
ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [147.83.29.146:55091]
        ssl.keystore.type = JKS
enable.auto.commit = true
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = groupId
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = latest

2017-09-15 12:48:03,642 INFO
org.apache.kafka.clients.consumer.ConsumerConfig              -
ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [147.83.29.146:55091]
        ssl.keystore.type = JKS
        enable.auto.commit = true
        sasl.mechanism = GSSAPI
        interceptor.classes = null
exclude.internal.topics = true
        ssl.truststore.password = null
        client.id = consumer-1
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = groupId
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = latest

2017-09-15 12:48:03,696 INFO  com.datastax.driver.core.GuavaCompatibility                  
- Detected Guava < 19 in the classpath, using legacy compatibility layer
2017-09-15 12:48:03,716 INFO  org.apache.kafka.common.utils.AppInfoParser                  
- Kafka version : 0.10.0.1
2017-09-15 12:48:03,718 INFO  org.apache.kafka.common.utils.AppInfoParser                  
- Kafka commitId : a7a17cdec9eaa6c5
2017-09-15 12:48:03,998 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09  - Got 10
partitions from these topics: [LCacc]
2017-09-15 12:48:03,999 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09  - Consumer
is going to read the following topics (with number of partitions): LCa$
2017-09-15 12:48:04,099 INFO
org.apache.kafka.clients.consumer.ConsumerConfig              -
ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [147.83.29.146:55091]
        ssl.keystore.type = JKS
        enable.auto.commit = true
sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = groupId
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = latest

2017-09-15 12:48:04,111 INFO
org.apache.kafka.clients.consumer.ConsumerConfig              -
ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [147.83.29.146:55091]
        ssl.keystore.type = JKS
        enable.auto.commit = true
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
ssl.truststore.password = null
        client.id = consumer-2
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = groupId
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = latest

2017-09-15 12:48:04,127 INFO  org.apache.kafka.common.utils.AppInfoParser                  
- Kafka version : 0.10.0.1
2017-09-15 12:48:04,127 INFO  org.apache.kafka.common.utils.AppInfoParser                  
- Kafka commitId : a7a17cdec9eaa6c5
2017-09-15 12:48:04,302 INFO
org.apache.kafka.clients.consumer.internals.AbstractCoordinator  -
Discovered coordinator 147.83.29.146:55091 (id: 2147483647 rack: null) for
group$
2017-09-15 12:48:04,371 INFO  com.datastax.driver.core.ClockFactory                        
- Using native clock to generate timestamps.
2017-09-15 12:48:04,445 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-0 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,476 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-1 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,509 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-2 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,540 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-3 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,565 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-4 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,596 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-5 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,619 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-6 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,655 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-7 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,688 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-8 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,720 INFO
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  -
Partition LCacc-9 has no initial offset; the consumer has position 0, so
the$
2017-09-15 12:48:04,896 INFO  com.datastax.driver.core.NettyUtil                          
- Found Netty's native epoll transport in the classpath, using it
2017-09-15 12:48:05,557 INFO
com.datastax.driver.core.policies.DCAwareRoundRobinPolicy     - Using
data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is
incorr
2017-09-15 12:48:05,559 INFO  com.datastax.driver.core.Cluster                            
- New Cassandra host /147.83.29.146:55092 added
2017-09-15 12:48:05,681 INFO  com.datastax.driver.core.ClockFactory                        
- Using native clock to generate timestamps.
2017-09-15 12:48:05,819 INFO
com.datastax.driver.core.policies.DCAwareRoundRobinPolicy     - Using
data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is
incorrect, please provide the$
2017-09-15 12:48:05,819 INFO  com.datastax.driver.core.Cluster                            
- New Cassandra host /147.83.29.146:55092 added
2017-09-15 12:48:10,630 INFO
org.numenta.nupic.flink.streaming.api.operator.AbstractHTMInferenceOperator
- Created HTM network DayDemoNetwork
2017-09-15 12:48:10,675 WARN  org.numenta.nupic.network.Layer                              
- The number of Input Dimensions (1) != number of Column Dimensions (1)
--OR-- Encoder width (2350) != produ$
2017-09-15 12:48:10,678 INFO  org.numenta.nupic.network.Layer                              
- Input dimension fix successful!
2017-09-15 12:48:10,678 INFO  org.numenta.nupic.network.Layer                              
- Using calculated input dimensions: [2350]
2017-09-15 12:48:10,679 INFO  org.numenta.nupic.network.Layer                              
- Classifying "value" input field with CLAClassifier



In /dmsg there is nothing to show.
After the start of the execution there are no described errors.
Anyway, in the time before it crashes the job is not executed really and cpu
is about 100% (verified with top command)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
In reply to this post by AndreaKinn
the job manager log probably is more interesting:

2017-09-15 12:47:45,420 WARN  org.apache.hadoop.util.NativeCodeLoader                      
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-
--------------------------------------------------------------------------------
2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @
10:23:11 UTC)
2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Current user: giordano
2017-09-15 12:47:45,651 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation -
1.8/25.131-b11
2017-09-15 12:47:45,652 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Maximum heap size: 491 MiBytes
2017-09-15 12:47:45,652 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  JAVA_HOME: /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Hadoop version: 2.7.2
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  JVM Options:
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     -Xms512m
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     -Xmx512m
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-    
-Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-giordano-2-2-100-1.log
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-    
-Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-    
-Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Program Arguments:
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     --configDir
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     --executionMode
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-     cluster
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-  Classpath:
/home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giordano/flink-1.3.2/lib/flin$
2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
-
--------------------------------------------------------------------------------
2017-09-15 12:47:45,661 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Registered UNIX signal handlers for [TERM, HUP, INT]
2017-09-15 12:47:45,947 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Loading configuration from /home/giordano/flink-1.3.2/conf
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:45,953 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:45,954 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:45,955 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:45,956 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:45,956 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend.fs.checkpointdir,
file:///home/flink-checkpoints
2017-09-15 12:47:45,970 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager without high-availability
2017-09-15 12:47:45,973 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager on localhost:6123 with execution mode CLUSTER
2017-09-15 12:47:45,993 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, localhost
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 512
2017-09-15 12:47:45,995 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 2
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend, filesystem
2017-09-15 12:47:45,996 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend.fs.checkpointdir,
file:///home/flink-checkpoints
2017-09-15 12:47:46,045 INFO
org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
set to giordano (auth:SIMPLE)
2017-09-15 12:47:46,209 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager actor system reachable at localhost:6123
2017-09-15 12:47:46,878 INFO  akka.event.slf4j.Slf4jLogger                                
- Slf4jLogger started
2017-09-15 12:47:46,982 INFO  Remoting                                                    
- Starting remoting
2017-09-15 12:47:47,392 INFO  Remoting                                                    
- Remoting started; listening on addresses
:[akka.tcp://flink@localhost:6123]
2017-09-15 12:47:47,423 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager web frontend
2017-09-15 12:47:47,433 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
location of JobManager log file:
/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-gio$
2017-09-15 12:47:47,434 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
location of JobManager stdout file:
/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-$
2017-09-15 12:47:47,434 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using
directory /tmp/flink-web-f9816186-7918-4475-8359-c3cafb63559a for the web
interface files
2017-09-15 12:47:47,435 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using
directory /tmp/flink-web-75aba058-8587-4181-bf16-ee2e68d9b70c for web
frontend JAR file uploads
2017-09-15 12:47:47,853 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Web frontend
listening at 0:0:0:0:0:0:0:0:8081
2017-09-15 12:47:47,854 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager actor
2017-09-15 12:47:47,876 INFO  org.apache.flink.runtime.blob.BlobServer                    
- Created BLOB server storage directory
/tmp/blobStore-9ad4e807-069e-4fb7-88b3-410fbdcb5eb0
2017-09-15 12:47:47,879 INFO  org.apache.flink.runtime.blob.BlobServer                    
- Started BLOB server at 0.0.0.0:39682 - max concurrent requests: 50 - max
backlog: 1000
2017-09-15 12:47:47,896 INFO
org.apache.flink.runtime.metrics.MetricRegistry               - No metrics
reporter configured, no metrics will be exposed/reported.
2017-09-15 12:47:47,906 INFO
org.apache.flink.runtime.jobmanager.MemoryArchivist           - Started
memory archivist akka://flink/user/archive
2017-09-15 12:47:47,914 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Starting JobManager at akka.tcp://flink@localhost:6123/user/jobmanager.
2017-09-15 12:47:47,929 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Starting
with JobManager akka.tcp://flink@localhost:6123/user/jobmanager on port 8081
2017-09-15 12:47:47,930 INFO
org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader
reachable under
akka.tcp://flink@localhost:6123/user/jobmanager:00000000-0000-0000-0000-0000000$
2017-09-15 12:47:47,970 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- Trying to associate with JobManager leader
akka.tcp://flink@localhost:6123/user/jobmanag$
2017-09-15 12:47:48,010 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- JobManager akka.tcp://flink@localhost:6123/user/jobmanager was granted
leadership with leader session ID S$
2017-09-15 12:47:48,034 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- Resource Manager associating with leading JobManager
Actor[akka://flink/user/jobmanager#$
2017-09-15 12:47:51,071 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:51,383 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:51,718 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:52,109 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:52,414 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:53,130 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:54,365 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- TaskManager cf04d1390ff86aba4d1702ef1a0d2b67 has started.
2017-09-15 12:47:54,368 INFO
org.apache.flink.runtime.instance.InstanceManager             - Registered
TaskManager at giordano-2-2-100-1
(akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager) $
2017-09-15 12:47:54,434 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:55,150 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:58,455 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:47:59,172 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:01,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Submitting job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
2017-09-15 12:48:01,768 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Using restart strategy
FixedDelayRestartStrategy(maxNumberRestartAttempts=1,
delayBetweenRestartAttempts=0$
2017-09-15 12:48:01,782 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job recovers
via failover strategy: full graph restart
2017-09-15 12:48:01,796 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Running initialization on master for job Flink Streaming Job
(df1f60ca168364759c69dbe078544346).
2017-09-15 12:48:01,796 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Successfully ran initialization on master in 0 ms.
2017-09-15 12:48:01,830 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- State backend is set to heap memory (checkpoints to filesystem
"file:/home/flink-checkpoints")
2017-09-15 12:48:01,843 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Scheduling job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
2017-09-15 12:48:01,844 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink
Streaming Job (df1f60ca168364759c69dbe078544346) switched from state CREATED
to RUNNING.
2017-09-15 12:48:01,861 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from CREA$
2017-09-15 12:48:01,882 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from CREATED to
SCHEDULED.
2017-09-15 12:48:01,883 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:01,895 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from SCHE$
2017-09-15 12:48:01,912 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
Source: Custom Source -> Timestamps/Watermarks (1/1) (attempt #0) to
giordano-2-2-100-1
2017-09-15 12:48:01,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from SCHEDULED to
DEPLOYING.
2017-09-15 12:48:01,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
Map -> Sink: Unnamed (1/1) (attempt #0) to giordano-2-2-100-1
2017-09-15 12:48:01,941 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:01,941 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
(1/1) (attempt #0) to$
2017-09-15 12:48:03,018 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
(d24f111f9720c8e4df77f67a$
2017-09-15 12:48:03,038 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from DEPL$
2017-09-15 12:48:03,046 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from DEPLOYING to
RUNNING.
2017-09-15 12:48:06,474 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:07,189 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:22,495 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
$flink@giordano-2-2-100-1:6123/]] arriving at
[akka.tcp://flink@giordano-2-2-100-1:6123] inbound addresses are
[akka.tcp://flink@localhost:6123]
2017-09-15 12:48:52,506 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:48:53,212 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:22,524 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:23,230 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:52,544 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@$
2017-09-15 12:49:59,410 WARN  akka.remote.RemoteWatcher                                    
- Detected unreachable: [akka.tcp://flink@giordano-2-2-100-1:35127]
2017-09-15 12:49:59,445 INFO  org.apache.flink.runtime.jobmanager.JobManager              
- Task manager akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager
terminated.
2017-09-15 12:49:59,446 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Custom Source -> Timestamps/Watermarks (1/1)
(bc0e95e951deb6680cff372a954950d2) switched from RUNN$
java.lang.Exception: TaskManager was lost/killed:
cf04d1390ff86aba4d1702ef1a0d2b67 @ giordano-2-2-100-1 (dataPort=36806)

it tag as unreachable the task manager (who reside on the same node...)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

Aljoscha Krettek
I think it might be that the computation is to CPU heavy, which makes the TaskManager unresponsive to any JobManager messages and so the JobManager thinks that the TaskManager is lost.

@Till, do you have another idea about what could be going on?

> On 15. Sep 2017, at 13:52, AndreaKinn <[hidden email]> wrote:
>
> the job manager log probably is more interesting:
>
> 2017-09-15 12:47:45,420 WARN  org.apache.hadoop.util.NativeCodeLoader                      
> - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -
> --------------------------------------------------------------------------------
> 2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @
> 10:23:11 UTC)
> 2017-09-15 12:47:45,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Current user: giordano
> 2017-09-15 12:47:45,651 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation -
> 1.8/25.131-b11
> 2017-09-15 12:47:45,652 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Maximum heap size: 491 MiBytes
> 2017-09-15 12:47:45,652 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  JAVA_HOME: /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Hadoop version: 2.7.2
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  JVM Options:
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     -Xms512m
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     -Xmx512m
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -    
> -Dlog.file=/home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-giordano-2-2-100-1.log
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -    
> -Dlog4j.configuration=file:/home/giordano/flink-1.3.2/conf/log4j.properties
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -    
> -Dlogback.configurationFile=file:/home/giordano/flink-1.3.2/conf/logback.xml
> 2017-09-15 12:47:45,658 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Program Arguments:
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     --configDir
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     /home/giordano/flink-1.3.2/conf
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     --executionMode
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -     cluster
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -  Classpath:
> /home/giordano/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/home/giordano/flink-1.3.2/lib/flin$
> 2017-09-15 12:47:45,659 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> -
> --------------------------------------------------------------------------------
> 2017-09-15 12:47:45,661 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Registered UNIX signal handlers for [TERM, HUP, INT]
> 2017-09-15 12:47:45,947 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Loading configuration from /home/giordano/flink-1.3.2/conf
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-09-15 12:47:45,953 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.heap.mb, 512
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 2
> 2017-09-15 12:47:45,954 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-09-15 12:47:45,955 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-09-15 12:47:45,956 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: state.backend, filesystem
> 2017-09-15 12:47:45,956 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: state.backend.fs.checkpointdir,
> file:///home/flink-checkpoints
> 2017-09-15 12:47:45,970 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager without high-availability
> 2017-09-15 12:47:45,973 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager on localhost:6123 with execution mode CLUSTER
> 2017-09-15 12:47:45,993 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: env.java.home, /usr/lib/jvm/java-8-oracle
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.heap.mb, 512
> 2017-09-15 12:47:45,995 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 2
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: state.backend, filesystem
> 2017-09-15 12:47:45,996 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: state.backend.fs.checkpointdir,
> file:///home/flink-checkpoints
> 2017-09-15 12:47:46,045 INFO
> org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
> set to giordano (auth:SIMPLE)
> 2017-09-15 12:47:46,209 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager actor system reachable at localhost:6123
> 2017-09-15 12:47:46,878 INFO  akka.event.slf4j.Slf4jLogger                                
> - Slf4jLogger started
> 2017-09-15 12:47:46,982 INFO  Remoting                                                    
> - Starting remoting
> 2017-09-15 12:47:47,392 INFO  Remoting                                                    
> - Remoting started; listening on addresses
> :[akka.tcp://flink@localhost:6123]
> 2017-09-15 12:47:47,423 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager web frontend
> 2017-09-15 12:47:47,433 INFO
> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
> location of JobManager log file:
> /home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-gio$
> 2017-09-15 12:47:47,434 INFO
> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
> location of JobManager stdout file:
> /home/giordano/flink-1.3.2/log/flink-giordano-jobmanager-0-$
> 2017-09-15 12:47:47,434 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using
> directory /tmp/flink-web-f9816186-7918-4475-8359-c3cafb63559a for the web
> interface files
> 2017-09-15 12:47:47,435 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using
> directory /tmp/flink-web-75aba058-8587-4181-bf16-ee2e68d9b70c for web
> frontend JAR file uploads
> 2017-09-15 12:47:47,853 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Web frontend
> listening at 0:0:0:0:0:0:0:0:8081
> 2017-09-15 12:47:47,854 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager actor
> 2017-09-15 12:47:47,876 INFO  org.apache.flink.runtime.blob.BlobServer                    
> - Created BLOB server storage directory
> /tmp/blobStore-9ad4e807-069e-4fb7-88b3-410fbdcb5eb0
> 2017-09-15 12:47:47,879 INFO  org.apache.flink.runtime.blob.BlobServer                    
> - Started BLOB server at 0.0.0.0:39682 - max concurrent requests: 50 - max
> backlog: 1000
> 2017-09-15 12:47:47,896 INFO
> org.apache.flink.runtime.metrics.MetricRegistry               - No metrics
> reporter configured, no metrics will be exposed/reported.
> 2017-09-15 12:47:47,906 INFO
> org.apache.flink.runtime.jobmanager.MemoryArchivist           - Started
> memory archivist akka://flink/user/archive
> 2017-09-15 12:47:47,914 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Starting JobManager at akka.tcp://flink@localhost:6123/user/jobmanager.
> 2017-09-15 12:47:47,929 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Starting
> with JobManager akka.tcp://flink@localhost:6123/user/jobmanager on port 8081
> 2017-09-15 12:47:47,930 INFO
> org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader
> reachable under
> akka.tcp://flink@localhost:6123/user/jobmanager:00000000-0000-0000-0000-0000000$
> 2017-09-15 12:47:47,970 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - Trying to associate with JobManager leader
> akka.tcp://flink@localhost:6123/user/jobmanag$
> 2017-09-15 12:47:48,010 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - JobManager akka.tcp://flink@localhost:6123/user/jobmanager was granted
> leadership with leader session ID S$
> 2017-09-15 12:47:48,034 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - Resource Manager associating with leading JobManager
> Actor[akka://flink/user/jobmanager#$
> 2017-09-15 12:47:51,071 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:51,383 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:51,718 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:52,109 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:52,414 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:53,130 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:54,365 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - TaskManager cf04d1390ff86aba4d1702ef1a0d2b67 has started.
> 2017-09-15 12:47:54,368 INFO
> org.apache.flink.runtime.instance.InstanceManager             - Registered
> TaskManager at giordano-2-2-100-1
> (akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager) $
> 2017-09-15 12:47:54,434 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:55,150 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:58,455 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:47:59,172 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:01,650 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Submitting job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
> 2017-09-15 12:48:01,768 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Using restart strategy
> FixedDelayRestartStrategy(maxNumberRestartAttempts=1,
> delayBetweenRestartAttempts=0$
> 2017-09-15 12:48:01,782 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job recovers
> via failover strategy: full graph restart
> 2017-09-15 12:48:01,796 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Running initialization on master for job Flink Streaming Job
> (df1f60ca168364759c69dbe078544346).
> 2017-09-15 12:48:01,796 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Successfully ran initialization on master in 0 ms.
> 2017-09-15 12:48:01,830 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - State backend is set to heap memory (checkpoints to filesystem
> "file:/home/flink-checkpoints")
> 2017-09-15 12:48:01,843 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Scheduling job df1f60ca168364759c69dbe078544346 (Flink Streaming Job).
> 2017-09-15 12:48:01,844 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink
> Streaming Job (df1f60ca168364759c69dbe078544346) switched from state CREATED
> to RUNNING.
> 2017-09-15 12:48:01,861 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from CREA$
> 2017-09-15 12:48:01,882 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from CREATED to
> SCHEDULED.
> 2017-09-15 12:48:01,883 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:01,895 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from SCHE$
> 2017-09-15 12:48:01,912 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
> Source: Custom Source -> Timestamps/Watermarks (1/1) (attempt #0) to
> giordano-2-2-100-1
> 2017-09-15 12:48:01,933 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from SCHEDULED to
> DEPLOYING.
> 2017-09-15 12:48:01,933 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
> Map -> Sink: Unnamed (1/1) (attempt #0) to giordano-2-2-100-1
> 2017-09-15 12:48:01,941 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:01,941 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
> Learn -> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink)
> (1/1) (attempt #0) to$
> 2017-09-15 12:48:03,018 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Learn ->
> Select -> Process -> (Sink: Cassandra Sink, Sink: Cassandra Sink) (1/1)
> (d24f111f9720c8e4df77f67a$
> 2017-09-15 12:48:03,038 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from DEPL$
> 2017-09-15 12:48:03,046 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map -> Sink:
> Unnamed (1/1) (7ecc9cea7b0132f9604bce99545b49bd) switched from DEPLOYING to
> RUNNING.
> 2017-09-15 12:48:06,474 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:07,189 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:22,495 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> $flink@giordano-2-2-100-1:6123/]] arriving at
> [akka.tcp://flink@giordano-2-2-100-1:6123] inbound addresses are
> [akka.tcp://flink@localhost:6123]
> 2017-09-15 12:48:52,506 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:48:53,212 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:22,524 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:23,230 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:52,544 ERROR akka.remote.EndpointWriter                                  
> - dropping message [class akka.actor.ActorSelectionMessage] for non-local
> recipient [Actor[akka.tcp://flink@$
> 2017-09-15 12:49:59,410 WARN  akka.remote.RemoteWatcher                                    
> - Detected unreachable: [akka.tcp://flink@giordano-2-2-100-1:35127]
> 2017-09-15 12:49:59,445 INFO  org.apache.flink.runtime.jobmanager.JobManager              
> - Task manager akka.tcp://flink@giordano-2-2-100-1:35127/user/taskmanager
> terminated.
> 2017-09-15 12:49:59,446 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Timestamps/Watermarks (1/1)
> (bc0e95e951deb6680cff372a954950d2) switched from RUNN$
> java.lang.Exception: TaskManager was lost/killed:
> cf04d1390ff86aba4d1702ef1a0d2b67 @ giordano-2-2-100-1 (dataPort=36806)
>
> it tag as unreachable the task manager (who reside on the same node...)
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
This post was updated on .
Moreover, the cpu usage of the other task manager is about 0%.
It seems the problem reside on the job manager which it is not able to distribute the work through the cluster.
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
I tried also to set the only job manager on the first node and reconfiguring
the cluster admitting just two task manager. In this way I obtain
immediately a NoResourceAvailable error



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
Update:

Following other discussions I even tried to reduce memory.fraction to 10%
without success.
How can I set G1 as garbage collector?
the key is env.java.opts but the value?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

Jürgen Thomann
Hi,

You can set it to G1GC with the following setting. In my example it is
only for the taskmanager, but env.java.opts should work in the same way.

env.java.opts.taskmanager: -XX:+UseG1GC

Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
Thank you, unfortunately it had no effects.

As I add more load on the computation appears the error taskmanager killed
on the node on use, without calling other nodes to sustain the computation.
I also increased

akka.watch.heartbeat.interval
akka.watch.heartbeat.pause
akka.transport.heartbeat.interval
akka.transport.heartbeat.pause

obtaining just a (very ) delayed error.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

Aljoscha Krettek
Btw, what load are you putting on the cluster, i.e. what is your computation? If you don't have load, the cluster and job just keep on running, right?

Best,
Aljoscha

> On 19. Sep 2017, at 12:00, AndreaKinn <[hidden email]> wrote:
>
> Thank you, unfortunately it had no effects.
>
> As I add more load on the computation appears the error taskmanager killed
> on the node on use, without calling other nodes to sustain the computation.
> I also increased
>
> akka.watch.heartbeat.interval
> akka.watch.heartbeat.pause
> akka.transport.heartbeat.interval
> akka.transport.heartbeat.pause
>
> obtaining just a (very ) delayed error.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: NoResourceAvailable exception

AndreaKinn
the program is composed by:

6 Kafka /source/ connector with custom timestamp and watermark /extractor/
and /map/ function each.
then I use 6 instance of an external library called flink-htm (quite heavy)
moreover I have 6 /process/ method and 2 /union/ method to merge result
streams.
Finally I have 2 Cassandra /sinks/.

Data which arriving to kafka are 1 kb strings about each 20ms.

I'm absolutely sure that the flink-htm library is heavy but I hoped flink
managed them distributing the load (which are independent) through the
cluster... instead it seems like just one node suffers all the load
crashing.

If can help I can share my code.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/