Issue with Flink not able to properly read the ResourceManager address for a HA setup

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with Flink not able to properly read the ResourceManager address for a HA setup

Sai Inampudi
Hi, I am trying to create a flink cluster on yarn, by running the following command but the logs[1] are showing that it is unable to properly connect to the ResourceManager

~/flink-1.5.4/bin/yarn-session.sh -n 5 -tm 2048 -s 4 -d -nm flink_yarn


I found a stackoverflow[2] post where someone mentioned that this could be a result of the flink's packaged hadoop version being different than the hadoop on the node and therefore the flink is not able to properly read the ResourceManager address for a HA setup. However, I confirmed the versions are the same in my case. I downloaded flink-1.5.4-bin-hadoop26-scala_2.11 and when I do a hadoop version on the node, I get Hadoop 2.6.0-cdh5.14.0. Would anyone have any ideas on what else the issue could be?

Additional info: The cluster I am running these on is kerberized so I am not sure if that plays into the issue that is being caused. I setup flink-conf to use kerberos ticket cache and did a kinit before trying to stand up the cluster. I verified the ticket cache was generated by doing a klist (logs in the gist [2])


[1] https://gist.github.com/sai-inampudi/9e1e823096d2685ed2282827432ef311
[2] https://stackoverflow.com/questions/32085990/error-with-kerberos-authentication-when-executing-flink-example-code-on-yarn-clu

Reply | Threaded
Open this post in threaded view
|

Re: Issue with Flink not able to properly read the ResourceManager address for a HA setup

Paul Lam
Hi Sai,

It looks like the Hadoop config path is not correctly set. You could set the logging level in log4j-cli.properties to debug to get more informations.

Best,
Paul Lam

在 2018年12月20日,03:18,Sai Inampudi <[hidden email]> 写道:

Hi, I am trying to create a flink cluster on yarn, by running the following command but the logs[1] are showing that it is unable to properly connect to the ResourceManager

~/flink-1.5.4/bin/yarn-session.sh -n 5 -tm 2048 -s 4 -d -nm flink_yarn


I found a stackoverflow[2] post where someone mentioned that this could be a result of the flink's packaged hadoop version being different than the hadoop on the node and therefore the flink is not able to properly read the ResourceManager address for a HA setup. However, I confirmed the versions are the same in my case. I downloaded flink-1.5.4-bin-hadoop26-scala_2.11 and when I do a hadoop version on the node, I get Hadoop 2.6.0-cdh5.14.0. Would anyone have any ideas on what else the issue could be?

Additional info: The cluster I am running these on is kerberized so I am not sure if that plays into the issue that is being caused. I setup flink-conf to use kerberos ticket cache and did a kinit before trying to stand up the cluster. I verified the ticket cache was generated by doing a klist (logs in the gist [2])


[1] https://gist.github.com/sai-inampudi/9e1e823096d2685ed2282827432ef311
[2] https://stackoverflow.com/questions/32085990/error-with-kerberos-authentication-when-executing-flink-example-code-on-yarn-clu