Hi,
Here at work our security guys have chosen (long time ago) to only allow the firewalls to have the ports open that needed (I say: good call!). For the Yarn cluster this includes things like the proxy to see the application manager of an application. For everything we've done so far (i.e. mr/pig/...) this has worked fine. Now with Flink I run into problems: When I run either the yarn-session or a job on Yarn the application manager gets started and I can see the webinterface. The problem is that the jobmanager.rpc.address is on one of the worker nodes and the jobmanager.rpc.port is essentially a random value. A random value which is not accessible because of the firewall rules. So I cannot reach the jobmanager on the yarn cluster. How do I tackle this assuming that opening the all ports on the firewall is not an option? Or is this something that should be handled by Flink? ( Perhaps the application manager can proxy the RPC calls? ) -- Best regards / Met vriendelijke groeten,
Niels Basjes |
Hi Niels,
so the problem is that you can not submit a job to Flink using the "/bin/flink" tool, right? I assume Flink and its TaskManagers properly start and connect to each other (the number of TaskManagers is shown correctly in the web interface). I see the following solutions for the problem a) Add a new page in the job manager web frontend allowing users to upload and execute a jar with a flink job b) add options for starting the jobmanager and blob manager on the job manager container on fixed ports c) Somehow make the akka rpc requests and blob manager uploads over HTTP using the YARN proxy The reason why we use a free port instead a fixed port is that this way two job manager containers can run on the same machine. So solution b) would only work if users are not using multiple flink jobs / sessions on yarn at the same time (or you make somehow sure they are not running on the same machine). What's your take on the three solutions? Does anybody here know how MR is doing it? Are they running the ApplicationMaster RPC on a fixed port? Do they use HTTP-based calls over the proxy? Robert On Mon, Nov 2, 2015 at 4:05 PM, Niels Basjes <[hidden email]> wrote:
|
My take on those 3 options: a) Bad idea; people need to be able to automate their jobs and run them from the command line (i.e. bash, cron). b) Bad idea; Same reason you gave. In addition I do not want to reserve an open 'flink port' for every user who wants to run a job. c) From my perspective this sounds like the most viable solution. I don't know how they implemented this in MR. I know the way they did it actually works on our clusters (with firewalls). Niels Basjes On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <[hidden email]> wrote:
Best regards / Met vriendelijke groeten,
Niels Basjes |
In reply to this post by rmetzger0
Hi,
I forgot to answer your other question: On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <[hidden email]> wrote:
Correct. Flink starts (i see the jobmanager UI) but the actual job is not started. Niels Basjes |
Hi Niels, Thanks a lot for reporting this issue. I think it is a very common setup in corporate infrastructure to have restrictive firewall settings. For Flink 1.0 (and probably in a minor 0.10.X release) we will have to address this issue to ensure proper integration of Flink.I've created a JIRA to keep track: https://issues.apache.org/jira/browse/FLINK-2960 Best regards, Max On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <[hidden email]> wrote:
|
Great! I'll watch the issue and give it a test once I see a working patch. Niels Basjes On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <[hidden email]> wrote:
Best regards / Met vriendelijke groeten,
Niels Basjes |
While discussing with my colleagues about the issue today, we came up with another approach to resolve the issue:
d) Upload the job jar to HDFS (or another FS) and trigger the execution of the jar using an HTTP request to the web interface. We could add some tooling into the /bin/flink client to submit a job like this transparently, so users would not need to bother with the file upload and request sending. Also, Sachin started a discussion on the dev@ list to add support for submitting jobs over the web interface, so maybe we can base the fix for FLINK-2960 on that. I've also looked into the Hadoop MapReduce code and it seems they do the following: When submitting a job, they are uploading the job jar file to HDFS. They also upload a configuration file that contains all the config options of the job. Then, they submit this altogether as an application to YARN. So far, there has not been any firewall involved. They establish a connection between the JobClient and the ApplicationMaster when the user is querying the current job status, but I could not find any special code getting the status over HTTP. But I found the following configuration parameter: "yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to allocate the AM port within that range (if specified). Niels, can you check if this configuration parameter is set in your environment? I assume your firewall allows outside connections from that port range. So we also have a new approach: f) Allocate the YARN application master (and blob manager) within a user-specified port-range. This would be really easy to implement, because we would just need to go through the range until we find an available port. On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <[hidden email]> wrote:
|
Hi, I checked and this setting has been set to a limited port range of only 100 port numbers. I tried to find the actual port an AM is running on and couldn't find it (I'm not the admin on that cluster) As you can see I never connect directly; always via the proxy that runs over the master on a single fixed port. Niels On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger <[hidden email]> wrote:
Best regards / Met vriendelijke groeten,
Niels Basjes |
Hi, cool, that's good news. The RM proxy is only for the web interface of the AM. I'm pretty sure that the MapReduce AM has at least two ports: - one for the web interface (accessible through the RM proxy, so behind the firewall) - one for the AM RPC (and that port is allocated within the configured range, open through the firewall). You can probably find the RPC port in the log file of the running MapReduce AM (to find that, identify the NodeManager running the AM, access the NM web interface and retrieve the logs of the container running the AM). Maybe the mapreduce client also logs the AM RPC port when querying the status of a running job. On Thu, Nov 5, 2015 at 2:59 PM, Niels Basjes <[hidden email]> wrote:
|
That is what I tried. Couldn't find that port though. On Thu, Nov 5, 2015 at 3:06 PM, Robert Metzger <[hidden email]> wrote:
Best regards / Met vriendelijke groeten,
Niels Basjes |
I'm also running into an issue with a non-YARN cluster. When submitting a JAR to Flink, we'll need to have an arbitrary port open on all of the hosts, which we don't know about until the socket attempts to bind; a bit of a problem for us. Are there ways to submit a JAR to Flink that bypasses the need for the BlobServer's random port binding? Or, to control the port BlobServer binds to? Cheers, Cory On Thu, Nov 5, 2015 at 8:07 AM, Niels Basjes <[hidden email]> wrote:
|
Hi Cory! There is no flag to define the BlobServer port right now, but we should definitely add this: https://issues.apache.org/jira/browse/FLINK-2996 If your setup is such that the firewall problem is only between client and master node (and the workers can reach the master on all ports), then you can try two workarounds: 1) Start the program in the cluster (or on the master node, via ssh). 2) Add the program jar to the lib directory of Flink, and start your program with the RemoteExecutor, without a jar attachment. Then it only needs to communicate to the actor system (RPC) port, which is not random in standalone mode (6123 by default). Stephan On Tue, Nov 10, 2015 at 8:46 PM, Cory Monty <[hidden email]> wrote:
|
Thanks, Stephan. I'll give those two workarounds a try! On Tue, Nov 10, 2015 at 2:18 PM, Stephan Ewen <[hidden email]> wrote:
|
Hi, I just wanted to let you know that I didn't forget about this! The BlobManager in 1.0-SNAPSHOT has already a configuration parameter to use a certain range of ports. I'm trying to add the same feature for YARN tomorrow. Sorry for the delay. On Tue, Nov 10, 2015 at 9:27 PM, Cory Monty <[hidden email]> wrote:
|
I've finally fixed the issues identified here in the thread: The blob manager and the application master/job manager allocate their ports in a specified range. You can now whitelist a port range in the firewall and Flink services will only allocate ports in that range: https://github.com/apache/flink/blob/master/docs/setup/yarn_setup.md#running-flink-on-yarn-behind-firewalls Please let me know if that fixes your issues. Note that the fix is only available in 1.0-SNAPSHOT. On Wed, Nov 25, 2015 at 6:58 PM, Robert Metzger <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |