... Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 20, timeout: 30 seconds) Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 21, timeout: 30 seconds) Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 22, timeout: 30 seconds) Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 23, timeout: 30 seconds) Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 24, timeout: 30 seconds) ... Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#-275619168] - leader session null TaskManager ResourceID{resourceId='1132cbdaf2d8204e5e42e321e8592754'} has started. Registered TaskManager at MY_PRIV_IP (akka://flink/deadLetters) as 7d9568445b4557a74d05a0771a08ad9c. Current number of registered hosts is 1. Current number of alive task slots is 20. |
Hi,
Search both job manager and task manager logs for ip address(es) and port(s) that have timeouted. First of all make sure that nodes are visible to each other using some simple ping. Afterwards please check that those timeouted ports are opened and not blocked by some firewall (telnet). You can search the documentation for the configuration parameters with “port” in name: But note that many of them are random by default. Piotrek
|
Thanks for response; The JobManager & TaskManager logged ports are open!And sorry the passed time. 2018-01-15 13:40:03,455 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@172.16.20.18:6123/user/jobmanager:null. When I kill task-manger, the jobmanager logs: 2018-01-15 13:32:41,419 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@stage_dbq_1:45532] has failed, address is now gated for [5000] ms. Reason: [Disassociated] But it will not decrement the number of available task-managers! and when I start my signle task-manager again, it logs: 2018-01-15 13:32:52,753 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at ??? (akka://flink/deadLetters) as 626846ae27a833cb094eeeb047a6a72c. Current number of registered hosts is 2. Current number of alive task slots is 40. On Wed, Jan 10, 2018 at 11:36 AM, Piotr Nowojski <[hidden email]> wrote:
-- رضا سامعی / http://samee.blog.ir |
Hi,
Could you post full job manager and task manager logs from startup until the first signs of the problem? Thanks, Piotrek
|
Hi, I attached log file, Thanks On Mon, Jan 15, 2018 at 3:36 PM, Piotr Nowojski <[hidden email]> wrote:
-- رضا سامعی / http://samee.blog.ir |
Hi,
It seems like you have not opened some of the ports. As I pointed out in the first mail, please go through all of the config options regarding hostnames/ports (not only those that appear in the log files, maybe something is not being logged) jobmanager.rpc.port taskmanager.rpc.port taskmanager.data.port blob.server.port And double check that they are accessible from appropriate machines, best by using some external tool like telnet and ncat. You network can be configured to accept some connections only from specific hosts (like localhost). For example in the case for which you attached the log files, did you check that the job manager host, can open a connection to the `stage_dbq_1:33633` (task manager host and it’s rpc port - rpc port by default is random). Also make sure that the configurations on the task manager and job manager are consistent. Piotrek
|
Free forum by Nabble | Edit this page |