http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Running-on-a-firewalled-Yarn-cluster-tp3330p3373.html
While discussing with my colleagues about the issue today, we came up with another approach to resolve the issue:
d) Upload the job jar to HDFS (or another FS) and trigger the execution of the jar using an HTTP request to the web interface.
We could add some tooling into the /bin/flink client to submit a job like this transparently, so users would not need to bother with the file upload and request sending.
Also, Sachin started a discussion on the dev@ list to add support for submitting jobs over the web interface, so maybe we can base the fix for FLINK-2960 on that.
I've also looked into the Hadoop MapReduce code and it seems they do the following:
When submitting a job, they are uploading the job jar file to HDFS. They also upload a configuration file that contains all the config options of the job. Then, they submit this altogether as an application to YARN.
So far, there has not been any firewall involved. They establish a connection between the JobClient and the ApplicationMaster when the user is querying the current job status, but I could not find any special code getting the status over HTTP.
But I found the following configuration parameter: "yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to allocate the AM port within that range (if specified).
Niels, can you check if this configuration parameter is set in your environment? I assume your firewall allows outside connections from that port range.
So we also have a new approach:
f) Allocate the YARN application master (and blob manager) within a user-specified port-range.
This would be really easy to implement, because we would just need to go through the range until we find an available port.