Hi, I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.
My configuration file looks like this. jobmanager.rpc.port: 6123 jobmanager.heap.size: 2048m taskmanager.heap.size: 2048m taskmanager.numberOfTaskSlots: 64 parallelism.default: 1 rest.port: 8081 high-availability.jobmanager.port: 50010 high-availability: zookeeper high-availability.storageDir:
file:///sharedflink/state_dir/ha/ high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181 high-availability.zookeeper.path.root: /flink high-availability.cluster-id: /flick_ns state.backend: rocksdb state.checkpoints.dir:
file:///sharedflink/state_dir/backend state.savepoints.dir:
file:///sharedflink/state_dir/savepoint state.backend.incremental: false state.backend.rocksdb.timer-service.factory: rocksdb state.backend.local-recovery: false But when I start services, I get this error message. java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token mismatch: Ignoring message RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a, RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected
fencing token bce1729df0a2ab8a7ea0426ba9994482. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) But when I run JM and TM in single box, it is working fine. Please help to resolve this issue ASAP as I am running out of option and time.
-Samir Chauhan There's a reason we support Fair Dealing. YOU. This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately. No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email. PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712. PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America. |
Hi Samir, could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information. Cheers, Till On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:
|
Hi Till, Attached are the logs. My architecture is like this. 3 Zookeeper (Confluent Open Source) 2 Job Managers 2 Task Managers. All running on different Linux VM. My I ask what should be value of
high-availability.zookeeper.path.root: /flink
as it is running in different server. Also /sharedflink is storage shared across JM and TM. Does it require to be available in Zookeeper server also? Is there any special instruction for me which I should take care? Samir Chauhan From: Till Rohrmann [mailto:[hidden email]] Hi Samir, could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session
id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information. Cheers, Till On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:
There's a reason we support Fair Dealing. YOU. This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately. No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email. PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712. PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America. Flink.zip (61K) Download Attachment |
Hi Samir, I think the problem is that you've specified for the TMs a different cluster id than for the JM: /flick_ns vs. /flink_ns. Cheers, Till On Fri, Oct 5, 2018 at 6:29 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:
|
In reply to this post by Till Rohrmann
Hi Till, Thanks for identifying the issue. My cluster is up and running now. I have few queries. Can you have to anwer that?
jobmanager.rpc.address rest.address rest.bind-address jobmanager.web.address
Samir Chauhan From: Till Rohrmann [mailto:[hidden email]] Hi Samir, could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session
id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information. Cheers, Till On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:
There's a reason we support Fair Dealing. YOU. This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately. No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email. PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712. PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America. |
Hi Samir, 1. In your setup (not running on top of Yarn or Mesos) you need to set the jobmanager.rpc.address such that the JM process knows where to bind to. The other components use ZooKeeper to find out the addresses. The other properties should not be needed. 3. You can take a look at the ZooKeeper leader latch node. Alternatively, you can take a look at the address to which you are redirected when accessing the web UI. Cheers, Till On Sat, Oct 6, 2018 at 5:57 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:
|
Hi Till, Can you tell when do I receive below error message? 2018-10-13 03:02:01,337 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagersHandler - Could not retrieve the redirect address. java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(8b79d4540b45b3e622748b813d3a464b,
LocalRpcInvocation(requestRestAddress(Time))) sent to akka.tcp://flink@127.0.0.1:50010/user/dispatcher because the fencing token is null. Warm Regards, Samir Chauhan From: Till Rohrmann [mailto:[hidden email]] Hi Samir, 1. In your setup (not running on top of Yarn or Mesos) you need to set the jobmanager.rpc.address such that the JM process knows where to bind to. The other components use ZooKeeper to find out the addresses. The other properties should
not be needed. 3. You can take a look at the ZooKeeper leader latch node. Alternatively, you can take a look at the address to which you are redirected when accessing the web UI. Cheers, Till On Sat, Oct 6, 2018 at 5:57 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:
There's a reason we support Fair Dealing. YOU. This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately. No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email. PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712. PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America. |
This means that the Dispatcher has not set its leader session id which it gets once gaining the leadership. This can also happen if the Dispatcher just lost its leadership after you've sent the message. This problem should resolve itself once the new leadership information has been propagated. Cheers, Till On Fri, Oct 12, 2018 at 9:04 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |