org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Samir Chauhan

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Till Rohrmann
Hi Samir,

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

Cheers,
Till

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.

image001.gif (98 bytes) Download Attachment
image001.gif (98 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Samir Chauhan

Hi Till,

 

Attached are the logs. My architecture is like this.

 

3 Zookeeper (Confluent Open Source)

2 Job Managers

2 Task Managers.

 

All running on different Linux VM.

 

My I ask what should be value of high-availability.zookeeper.path.root: /flink as it is running in different server.

Also /sharedflink is storage shared across JM and TM. Does it require to be available in Zookeeper server also?

Is there any special instruction for me which I should take care?

 

Samir Chauhan

 

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.

Flink.zip (61K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Till Rohrmann
Hi Samir,

I think the problem is that you've specified for the TMs a different cluster id than for the JM: /flick_ns vs. /flink_ns.

Cheers,
Till

On Fri, Oct 5, 2018 at 6:29 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi Till,

 

Attached are the logs. My architecture is like this.

 

3 Zookeeper (Confluent Open Source)

2 Job Managers

2 Task Managers.

 

All running on different Linux VM.

 

My I ask what should be value of high-availability.zookeeper.path.root: /flink as it is running in different server.

Also /sharedflink is storage shared across JM and TM. Does it require to be available in Zookeeper server also?

Is there any special instruction for me which I should take care?

 

Samir Chauhan

 

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

RE: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Samir Chauhan
In reply to this post by Till Rohrmann

Hi Till,

 

Thanks for identifying the issue. My cluster is up and running now.

 

I have few queries. Can you have to anwer that?

 

  1. Do I need to set below properties in my cluster?

jobmanager.rpc.address

rest.address

rest.bind-address

jobmanager.web.address

  1. Is there anything I should be take care while setting it up?
  2. How do I know which job manager is active?
  3. How do I secure my cluster?

 

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Till Rohrmann
Hi Samir,

1. In your setup (not running on top of Yarn or Mesos) you need to set the jobmanager.rpc.address such that the JM process knows where to bind to. The other components use ZooKeeper to find out the addresses. The other properties should not be needed.
3. You can take a look at the ZooKeeper leader latch node. Alternatively, you can take a look at the address to which you are redirected when accessing the web UI.

Cheers,
Till

On Sat, Oct 6, 2018 at 5:57 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi Till,

 

Thanks for identifying the issue. My cluster is up and running now.

 

I have few queries. Can you have to anwer that?

 

  1. Do I need to set below properties in my cluster?

jobmanager.rpc.address

rest.address

rest.bind-address

jobmanager.web.address

  1. Is there anything I should be take care while setting it up?
  2. How do I know which job manager is active?
  3. How do I secure my cluster?

 

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

RE: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Samir Chauhan

Hi Till,

 

Can you tell when do I receive below error message?

 

2018-10-13 03:02:01,337 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagersHandler  - Could not retrieve the redirect address.

java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(8b79d4540b45b3e622748b813d3a464b, LocalRpcInvocation(requestRestAddress(Time))) sent to akka.tcp://flink@127.0.0.1:50010/user/dispatcher because the fencing token is null.

 

Warm Regards,

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Sunday, October 07, 2018 1:24 AM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

1. In your setup (not running on top of Yarn or Mesos) you need to set the jobmanager.rpc.address such that the JM process knows where to bind to. The other components use ZooKeeper to find out the addresses. The other properties should not be needed.

3. You can take a look at the ZooKeeper leader latch node. Alternatively, you can take a look at the address to which you are redirected when accessing the web UI.

 

Cheers,

Till

 

On Sat, Oct 6, 2018 at 5:57 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi Till,

 

Thanks for identifying the issue. My cluster is up and running now.

 

I have few queries. Can you have to anwer that?

 

  1. Do I need to set below properties in my cluster?

jobmanager.rpc.address

rest.address

rest.bind-address

jobmanager.web.address

  1. Is there anything I should be take care while setting it up?
  2. How do I know which job manager is active?
  3. How do I secure my cluster?

 

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

Till Rohrmann
This means that the Dispatcher has not set its leader session id which it gets once gaining the leadership. This can also happen if the Dispatcher just lost its leadership after you've sent the message. This problem should resolve itself once the new leadership information has been propagated.

Cheers,
Till

On Fri, Oct 12, 2018 at 9:04 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi Till,

 

Can you tell when do I receive below error message?

 

2018-10-13 03:02:01,337 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagersHandler  - Could not retrieve the redirect address.

java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(8b79d4540b45b3e622748b813d3a464b, LocalRpcInvocation(requestRestAddress(Time))) sent to akka.tcp://flink@127.0.0.1:50010/user/dispatcher because the fencing token is null.

 

Warm Regards,

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Sunday, October 07, 2018 1:24 AM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

1. In your setup (not running on top of Yarn or Mesos) you need to set the jobmanager.rpc.address such that the JM process knows where to bind to. The other components use ZooKeeper to find out the addresses. The other properties should not be needed.

3. You can take a look at the ZooKeeper leader latch node. Alternatively, you can take a look at the address to which you are redirected when accessing the web UI.

 

Cheers,

Till

 

On Sat, Oct 6, 2018 at 5:57 PM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi Till,

 

Thanks for identifying the issue. My cluster is up and running now.

 

I have few queries. Can you have to anwer that?

 

  1. Do I need to set below properties in my cluster?

jobmanager.rpc.address

rest.address

rest.bind-address

jobmanager.web.address

  1. Is there anything I should be take care while setting it up?
  2. How do I know which job manager is active?
  3. How do I secure my cluster?

 

Samir Chauhan

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Friday, October 05, 2018 11:09 PM
To: Samir Tusharbhai Chauhan <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

 

Hi Samir,

 

could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? 

 

It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect to the RM once it learns about the new leader session id via ZooKeeper. You could, for example check in ZooKeeper that it contains the valid leader information.

 

Cheers,

Till

 

On Fri, Oct 5, 2018 at 9:58 AM Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

 

I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager.

 

My configuration file looks like this.

 

jobmanager.rpc.port: 6123

jobmanager.heap.size: 2048m

taskmanager.heap.size: 2048m

taskmanager.numberOfTaskSlots: 64

parallelism.default: 1

rest.port: 8081

high-availability.jobmanager.port: 50010

high-availability: zookeeper

high-availability.storageDir: file:///sharedflink/state_dir/ha/

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.zookeeper.path.root: /flink

high-availability.cluster-id: /flick_ns

 

state.backend: rocksdb

state.checkpoints.dir: file:///sharedflink/state_dir/backend

state.savepoints.dir: file:///sharedflink/state_dir/savepoint

state.backend.incremental: false

state.backend.rocksdb.timer-service.factory: rocksdb

state.backend.local-recovery: false

 

But when I start services, I get this error message.

 

java.util.concurrent.CompletionException:

org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token

mismatch: Ignoring message

RemoteFencedMessage(b00185a18ea3da17ebe39ac411a84f3a,

RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, HardwareDescription, Time))) because the fencing token b00185a18ea3da17ebe39ac411a84f3a did not match the expected fencing token bce1729df0a2ab8a7ea0426ba9994482.

        at

java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)

 

 

But when I run JM and TM in single box, it is working fine.

 

Please help to resolve this issue ASAP as I am running out of option and time.

 

-Samir Chauhan

 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.