Hi all, Currently I have a Flink 1.4 cluster running on kubernetes and with SSL configuration based on https://ci.apache.org/ However, as the IP of the nodes are dynamic (from the nature of kubernetes), we are using only the DNS which we can control using kubernetes services. So we add to the Subject Alternative Name(SAN) the flink-jobmanager DNS and also the DNS for the task managers *.flink-taskmanager-svc (each task manager has a DNS in the form flink-taskmanager-0.flink- Additionally we set the jobmanager.rpc.address property on all the nodes and each task manager sets the taskmanager.host property, all matching the ones on the certificate. This is working well when using Job with Parallelism set to 1. The SSL validations are good and the Jobmanager can communicate with Task manager and vice versa. But when we set the parallelism to more than 1 we have exceptions on the SSL validation like this: Caused by: java.security.cert. at sun.security.util. at sun.security.util. at sun.security.ssl. at sun.security.ssl. at sun.security.ssl. at sun.security.ssl. at sun.security.ssl. ... 21 more From the logs I see the Jobmanager is correctly registering the taskmanagers: org.apache.flink.runtime. And also each taskmanager is correctly registered to use the hostname for communication: org.apache.flink.runtime. ... akka.remote.Remoting - Remoting started; listening on addresses :[akka.ssl.tcp://flink@flink- ... org.apache.flink.runtime.io. ... org.apache.flink.runtime. But even with that, it seems like the taskmanagers are using the IP communicate between them and the SSL validation fails. Do you know if it's possible to make the taskmanagers to use the hostname to communicate instead of the IP ? or Do you have any advice to get the SSL configuration to work on this environment ? Thanks in advance. Regards, Edward |
For which there was a PR at some point but nothing has been done so far. It seems the current code explicitly uses the IP vs Hostname for Netty SSL configuration. Without that I'm really wondering how people are reasonably using SSL on a Kubernetes Flink-based cluster as every time a pod is (re-started) it can theoretically take a different IP? Or do I miss something? -- Christophe On Tue, Mar 27, 2018 at 3:24 PM, Edward Alexander Rojas Clavijo <[hidden email]> wrote:
Christophe
|
Hi Edward, could you please file a JIRA issue for this problem. It might be as simple as that the TaskManager's network stack uses the IP instead of the hostname as you suggested. But we have to look into this to be sure. Also the logs of the JobManager as well as the TaskManagers could be helpful. Cheers, Till On Tue, Mar 27, 2018 at 5:17 PM, Christophe Jolif <[hidden email]> wrote:
|
Hi Edward, You can use this parameter in flink-conf.yaml to supress the hostname checking in certificates. If it suits your purpose.security.ssl.verify-hostname: false But I have not submitted job with higher parallelism. Since you are saying that you are facing issue when the parallelism is higher I guess that multiple task managers are not able to communicate among themselves. Make sure if have exposed the services of task managers correctly and surely logs will help. On Tue, Mar 27, 2018 at 9:18 PM, Till Rohrmann <[hidden email]> wrote:
|
In reply to this post by Till Rohrmann
Hi Till, I just created the JIRA ticket: https://issues.apache.org/jira/browse/FLINK-9103 I added the JobManager and TaskManager logs, Hope this helps to resolve the issue. Regards, Edward 2018-03-27 17:48 GMT+02:00 Till Rohrmann <[hidden email]>:
Edward Alexander Rojas Clavijo Software Engineer Hybrid Cloud IBM France |
Hi all,
I did some tests based on the PR Christophe mentioned above and by making a change on the NettyClient to use CanonicalHostName instead of HostNameAddress to identify the server, the SSL validation works!! I created a PR with this change: https://github.com/apache/flink/pull/5789 Regards, Edward 2018-03-28 17:22 GMT+02:00 Edward Alexander Rojas Clavijo <[hidden email]>:
|
Thank you Edward and Christophe! 2018-03-29 17:55 GMT+02:00 Edward Alexander Rojas Clavijo <[hidden email]>:
|
By the way Fabian, any chance this issue is looked into / the PR considered for 1.5?
-- Christophe On Wed, Apr 4, 2018 at 2:41 PM, Fabian Hueske <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |