Background: We have a setup of Flink 1.3.1 along with a secure MAPR cluster (Flink is running on mapr client nodes). We run this flink cluster via flink-jobmanager.sh foreground and flink-taskmanager.sh foreground command via Marathon. In order for us to make this work, we had to add -Djavax.net.ssl.trustStore="$JAVA_HOME/jre/lib/security/cacerts" in flink-console.sh as extra JVM arg (otherwise, flink was taking MAPR's ssl_truststore as default truststore and then we were facing issues for any 3rd party jars like aws_sdk etc.). This entire setup was working fine as it is and we could submit our jars and the pipelines ran without any problem
Problem: We started experimenting with enabling ssl for all communication for Flink. For this, we followed https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/security-ssl.html for generating CA and keystore. I added the following properties to flink-conf.yaml: security.ssl.enabled: true security.ssl.keystore: /opt/flink/certs/node1.keystore security.ssl.keystore-password: <password> security.ssl.key-password: <password> security.ssl.truststore: /opt/flink/certs/ca.truststore security.ssl.truststore-password: <password> jobmanager.web.ssl.enabled: true taskmanager.data.ssl.enabled: true blob.service.ssl.enabled: true akka.ssl.enabled: true We then spin up a cluster and tried submitting the same job which was working before. We get the following erros: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09 ClassLoader info: URL ClassLoader: Class not resolvable through given classloader. at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:229) at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:95) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:230) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:748) This error disappears when we remove the ssl config properties i.e run flink cluster without ssl enabled. So, did we miss any steps for enabling ssl? P.S.: We tried removing the extra JVm arg mentioned above, but still get the same error. Aniket |
The configuration looks reasonable.
Just to be sure, are the paths accessible by all nodes?
As a first step, could you set the logging level to DEBUG (by modifying the 'conf/log4j.properties' file), resubmit the job (after a cluster restart) and check the Job- and TaskManager logs for any exception? On 04.10.2017 03:15, Aniket Deshpande wrote:
|
something that would also help us
narrow down the problematic area is to enable SSL for one
component at a time and see
which one causesd the job to fail. On 04.10.2017 14:11, Chesnay Schepler wrote:
|
By following Chesney's recommendation we will hopefully uncover an SSL error that is being masked. Another thing to try is to disable hostname verification (it is enabled by default) to see whether the certificate is being rejected. On Wed, Oct 4, 2017 at 5:15 AM, Chesnay Schepler <[hidden email]> wrote:
|
Hi Chesnay, Thanks for the reply. After your suggestion, I found out that setting blob.service.ssl.enabled: false solved the issue and now all the pipelines run as expected. So, the issue is kinda narrowed down to blob service ssl now. I also checked the jobmanager logs when blob ssl is enabled and I see the following error: 2017-10-03 23:28:50.459 [BLOB connection for /<jm_ip>:46932] ERROR org.apache.flink.runtime.blob.BlobServerConnection - Error while executing BLOB connection. javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2023) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:928) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at sun.security.ssl.AppInputStream.read(AppInputStream.java:71) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:119) So, is there some additional steps that I have to follow for enabling SSL for blob service? On Wed, Oct 4, 2017 at 4:09 PM, Eron Wright <[hidden email]> wrote:
Yours Sincerely,
Aniket S Deshpande. |
I don't think this is a configuration
problem, but a bug in Flink. But we'll have to dig a little deeper
to be sure.
Besides the actual SSL problem, what concerns me is that we didn't fail earlier. If a bug in the SSL setup prevents the up- or download of jars then we should fail earlier. Looping in Nico who may have some input. On 04.10.2017 22:58, Aniket Deshpande wrote:
|
So, according to Eron's suggestion I tried security.ssl.verify-hostname: false configuration and that does the trick. I no longer get the classloader error even with blob.service.ssl. Do you think the hostname verification fails because we are running flink jobmanager and taskmanager via Marathon (and hence essentially as a mesos task)? On Wed, Oct 4, 2017 at 5:47 PM, Chesnay Schepler <[hidden email]> wrote:
Yours Sincerely,
Aniket S Deshpande. |
Free forum by Nabble | Edit this page |