Hi I'm having problems with self-signed certificiate trust with Native K8S

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Hi I'm having problems with self-signed certificiate trust with Native K8S

Kevin Kwon
Hi I am using MinIO as a S3 mock backend for Native K8S

Everything seems to be fine except that it cannot connect to S3 since self-signed certificates' trusted store are not cloned in Deployment resources

Below is in order, how I add the trusted keystore by using keytools and how I run my app with the built image

FROM registry.local/mde/my-flink-app:0.0.1
COPY s3/certs/public.crt $FLINK_HOME/s3-e2e-public.crt
RUN keytool \
-noprompt \
-alias s3-e2e-public \
-importcert \
-trustcacerts \
-keystore $JAVA_HOME/lib/security/cacerts \
-storepass changeit \
-file $FLINK_HOME/s3-e2e-public.crt
$FLINK_HOME/bin/flink run-application \
-t kubernetes-application \
-Denv.java.opts="-Dkafka.brokers=kafka-external:9092 -Dkafka.schema-registry.url=kafka-schemaregistry:8081" \
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" \
-Dkubernetes.cluster-id=${K8S_CLUSTERID} \
-Dkubernetes.container.image=${DOCKER_REPO}/${ORGANISATION}/${APP_NAME}:${APP_VERSION} \
-Dkubernetes.namespace=${K8S_NAMESPACE} \
-Dkubernetes.rest-service.exposed.type=${K8S_RESTSERVICE_EXPOSED_TYPE} \
-Dkubernetes.taskmanager.cpu=${K8S_TASKMANAGER_CPU} \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dtaskmanager.memory.process.size=${TASKMANAGER_MEMORY_PROCESS_SIZE} \
-Dtaskmanager.numberOfTaskSlots=${TASKMANAGER_NUMBEROFTASKSLOTS} \
-Ds3.endpoint=
s3:443 \
-Ds3.access-key=
${S3_ACCESSKEY} \
-Ds3.secret-key=
${S3_SECRETKEY} \
-Ds3.path.style.access=true \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=s3://
${ORGANISATION}/${APP_NAME}/checkpoint \
-Dstate.savepoints.dir=s3://
${ORGANISATION}/${APP_NAME}/savepoint \
local://
${FLINK_HOME}/usrlib/${APP_NAME}-assembly-${APP_VERSION}.jar
However, I get the following error and I don't see my trusted key in keytools when I login to the pod (seems the trustedstore is not cloned)
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:305) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:224) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:483) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:269) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:242) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_265]
        ... 6 more
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on mde: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?

Reply | Threaded
Open this post in threaded view
|

Re: Hi I'm having problems with self-signed certificiate trust with Native K8S

Kevin Kwon
I think what we need in the Native Kubernetis Config is to mount custom ConfigMap, Secrets, and Volumes

I see that in the upcoming release, Secrets are able to get mounted

https://github.com/apache/flink/pull/14005 <- also can maintainers look into this PR so we can mount other custom K8S resources?

On Fri, Nov 20, 2020 at 9:23 PM Kevin Kwon <[hidden email]> wrote:
Hi I am using MinIO as a S3 mock backend for Native K8S

Everything seems to be fine except that it cannot connect to S3 since self-signed certificates' trusted store are not cloned in Deployment resources

Below is in order, how I add the trusted keystore by using keytools and how I run my app with the built image

FROM registry.local/mde/my-flink-app:0.0.1
COPY s3/certs/public.crt $FLINK_HOME/s3-e2e-public.crt
RUN keytool \
-noprompt \
-alias s3-e2e-public \
-importcert \
-trustcacerts \
-keystore $JAVA_HOME/lib/security/cacerts \
-storepass changeit \
-file $FLINK_HOME/s3-e2e-public.crt
$FLINK_HOME/bin/flink run-application \
-t kubernetes-application \
-Denv.java.opts="-Dkafka.brokers=kafka-external:9092 -Dkafka.schema-registry.url=kafka-schemaregistry:8081" \
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" \
-Dkubernetes.cluster-id=${K8S_CLUSTERID} \
-Dkubernetes.container.image=${DOCKER_REPO}/${ORGANISATION}/${APP_NAME}:${APP_VERSION} \
-Dkubernetes.namespace=${K8S_NAMESPACE} \
-Dkubernetes.rest-service.exposed.type=${K8S_RESTSERVICE_EXPOSED_TYPE} \
-Dkubernetes.taskmanager.cpu=${K8S_TASKMANAGER_CPU} \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dtaskmanager.memory.process.size=${TASKMANAGER_MEMORY_PROCESS_SIZE} \
-Dtaskmanager.numberOfTaskSlots=${TASKMANAGER_NUMBEROFTASKSLOTS} \
-Ds3.endpoint=
s3:443 \
-Ds3.access-key=
${S3_ACCESSKEY} \
-Ds3.secret-key=
${S3_SECRETKEY} \
-Ds3.path.style.access=true \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=s3://
${ORGANISATION}/${APP_NAME}/checkpoint \
-Dstate.savepoints.dir=s3://
${ORGANISATION}/${APP_NAME}/savepoint \
local://
${FLINK_HOME}/usrlib/${APP_NAME}-assembly-${APP_VERSION}.jar
However, I get the following error and I don't see my trusted key in keytools when I login to the pod (seems the trustedstore is not cloned)
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:305) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:224) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:483) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:269) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:242) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_265]
        ... 6 more
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on mde: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?

Reply | Threaded
Open this post in threaded view
|

Re: Hi I'm having problems with self-signed certificiate trust with Native K8S

Till Rohrmann
Thanks for reaching out to the Flink community Kevin. Yes, with Flink 1.12.0 it should be possible to mount secrets with your K8s deployment. From the posted stack trace it is not possible to see what exactly is going wrong. Could you maybe post the complete logs? I am also pulling in Yang Wang who is most knowledgeable about Flink's K8s integration.

Cheers,
Till

On Sun, Nov 22, 2020 at 12:49 PM Kevin Kwon <[hidden email]> wrote:
I think what we need in the Native Kubernetis Config is to mount custom ConfigMap, Secrets, and Volumes

I see that in the upcoming release, Secrets are able to get mounted

https://github.com/apache/flink/pull/14005 <- also can maintainers look into this PR so we can mount other custom K8S resources?

On Fri, Nov 20, 2020 at 9:23 PM Kevin Kwon <[hidden email]> wrote:
Hi I am using MinIO as a S3 mock backend for Native K8S

Everything seems to be fine except that it cannot connect to S3 since self-signed certificates' trusted store are not cloned in Deployment resources

Below is in order, how I add the trusted keystore by using keytools and how I run my app with the built image

FROM registry.local/mde/my-flink-app:0.0.1
COPY s3/certs/public.crt $FLINK_HOME/s3-e2e-public.crt
RUN keytool \
-noprompt \
-alias s3-e2e-public \
-importcert \
-trustcacerts \
-keystore $JAVA_HOME/lib/security/cacerts \
-storepass changeit \
-file $FLINK_HOME/s3-e2e-public.crt
$FLINK_HOME/bin/flink run-application \
-t kubernetes-application \
-Denv.java.opts="-Dkafka.brokers=kafka-external:9092 -Dkafka.schema-registry.url=kafka-schemaregistry:8081" \
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" \
-Dkubernetes.cluster-id=${K8S_CLUSTERID} \
-Dkubernetes.container.image=${DOCKER_REPO}/${ORGANISATION}/${APP_NAME}:${APP_VERSION} \
-Dkubernetes.namespace=${K8S_NAMESPACE} \
-Dkubernetes.rest-service.exposed.type=${K8S_RESTSERVICE_EXPOSED_TYPE} \
-Dkubernetes.taskmanager.cpu=${K8S_TASKMANAGER_CPU} \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dtaskmanager.memory.process.size=${TASKMANAGER_MEMORY_PROCESS_SIZE} \
-Dtaskmanager.numberOfTaskSlots=${TASKMANAGER_NUMBEROFTASKSLOTS} \
-Ds3.endpoint=
s3:443 \
-Ds3.access-key=
${S3_ACCESSKEY} \
-Ds3.secret-key=
${S3_SECRETKEY} \
-Ds3.path.style.access=true \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=s3://
${ORGANISATION}/${APP_NAME}/checkpoint \
-Dstate.savepoints.dir=s3://
${ORGANISATION}/${APP_NAME}/savepoint \
local://
${FLINK_HOME}/usrlib/${APP_NAME}-assembly-${APP_VERSION}.jar
However, I get the following error and I don't see my trusted key in keytools when I login to the pod (seems the trustedstore is not cloned)
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:305) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:224) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:483) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:269) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:242) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_265]
        ... 6 more
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on mde: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?

Reply | Threaded
Open this post in threaded view
|

Re: Hi I'm having problems with self-signed certificiate trust with Native K8S

Yang Wang
Hi Kevin,

Let me try to understand your problem. You have added the trusted keystore to the Flink app image(my-flink-app:0.0.1)
and it could not be loaded. Right? Even though you tunnel in the pod, you could not find the key store. It is strange.

I know it is not very convenient to bundle the keystore in the image. Mounting them from ConfigMap could make things
easier. The reason why we still not have this (make ConfigMap/PersistentVolume mountable via Flink config option) is that
pod template[1] may be a more common way to get this done.


Best,
Yang


Till Rohrmann <[hidden email]> 于2020年11月23日周一 下午11:01写道:
Thanks for reaching out to the Flink community Kevin. Yes, with Flink 1.12.0 it should be possible to mount secrets with your K8s deployment. From the posted stack trace it is not possible to see what exactly is going wrong. Could you maybe post the complete logs? I am also pulling in Yang Wang who is most knowledgeable about Flink's K8s integration.

Cheers,
Till

On Sun, Nov 22, 2020 at 12:49 PM Kevin Kwon <[hidden email]> wrote:
I think what we need in the Native Kubernetis Config is to mount custom ConfigMap, Secrets, and Volumes

I see that in the upcoming release, Secrets are able to get mounted

https://github.com/apache/flink/pull/14005 <- also can maintainers look into this PR so we can mount other custom K8S resources?

On Fri, Nov 20, 2020 at 9:23 PM Kevin Kwon <[hidden email]> wrote:
Hi I am using MinIO as a S3 mock backend for Native K8S

Everything seems to be fine except that it cannot connect to S3 since self-signed certificates' trusted store are not cloned in Deployment resources

Below is in order, how I add the trusted keystore by using keytools and how I run my app with the built image

FROM registry.local/mde/my-flink-app:0.0.1
COPY s3/certs/public.crt $FLINK_HOME/s3-e2e-public.crt
RUN keytool \
-noprompt \
-alias s3-e2e-public \
-importcert \
-trustcacerts \
-keystore $JAVA_HOME/lib/security/cacerts \
-storepass changeit \
-file $FLINK_HOME/s3-e2e-public.crt
$FLINK_HOME/bin/flink run-application \
-t kubernetes-application \
-Denv.java.opts="-Dkafka.brokers=kafka-external:9092 -Dkafka.schema-registry.url=kafka-schemaregistry:8081" \
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" \
-Dkubernetes.cluster-id=${K8S_CLUSTERID} \
-Dkubernetes.container.image=${DOCKER_REPO}/${ORGANISATION}/${APP_NAME}:${APP_VERSION} \
-Dkubernetes.namespace=${K8S_NAMESPACE} \
-Dkubernetes.rest-service.exposed.type=${K8S_RESTSERVICE_EXPOSED_TYPE} \
-Dkubernetes.taskmanager.cpu=${K8S_TASKMANAGER_CPU} \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dtaskmanager.memory.process.size=${TASKMANAGER_MEMORY_PROCESS_SIZE} \
-Dtaskmanager.numberOfTaskSlots=${TASKMANAGER_NUMBEROFTASKSLOTS} \
-Ds3.endpoint=
s3:443 \
-Ds3.access-key=
${S3_ACCESSKEY} \
-Ds3.secret-key=
${S3_SECRETKEY} \
-Ds3.path.style.access=true \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=s3://
${ORGANISATION}/${APP_NAME}/checkpoint \
-Dstate.savepoints.dir=s3://
${ORGANISATION}/${APP_NAME}/savepoint \
local://
${FLINK_HOME}/usrlib/${APP_NAME}-assembly-${APP_VERSION}.jar
However, I get the following error and I don't see my trusted key in keytools when I login to the pod (seems the trustedstore is not cloned)
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:305) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:224) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:483) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:269) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:242) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_265]
        ... 6 more
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on mde: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?

Reply | Threaded
Open this post in threaded view
|

Re: Hi I'm having problems with self-signed certificiate trust with Native K8S

Till Rohrmann
Hi Kevin,

I expect the 1.12.0 release to happen within the next 3 weeks.

Cheers,
Till

On Tue, Nov 24, 2020 at 4:23 AM Yang Wang <[hidden email]> wrote:
Hi Kevin,

Let me try to understand your problem. You have added the trusted keystore to the Flink app image(my-flink-app:0.0.1)
and it could not be loaded. Right? Even though you tunnel in the pod, you could not find the key store. It is strange.

I know it is not very convenient to bundle the keystore in the image. Mounting them from ConfigMap could make things
easier. The reason why we still not have this (make ConfigMap/PersistentVolume mountable via Flink config option) is that
pod template[1] may be a more common way to get this done.


Best,
Yang


Till Rohrmann <[hidden email]> 于2020年11月23日周一 下午11:01写道:
Thanks for reaching out to the Flink community Kevin. Yes, with Flink 1.12.0 it should be possible to mount secrets with your K8s deployment. From the posted stack trace it is not possible to see what exactly is going wrong. Could you maybe post the complete logs? I am also pulling in Yang Wang who is most knowledgeable about Flink's K8s integration.

Cheers,
Till

On Sun, Nov 22, 2020 at 12:49 PM Kevin Kwon <[hidden email]> wrote:
I think what we need in the Native Kubernetis Config is to mount custom ConfigMap, Secrets, and Volumes

I see that in the upcoming release, Secrets are able to get mounted

https://github.com/apache/flink/pull/14005 <- also can maintainers look into this PR so we can mount other custom K8S resources?

On Fri, Nov 20, 2020 at 9:23 PM Kevin Kwon <[hidden email]> wrote:
Hi I am using MinIO as a S3 mock backend for Native K8S

Everything seems to be fine except that it cannot connect to S3 since self-signed certificates' trusted store are not cloned in Deployment resources

Below is in order, how I add the trusted keystore by using keytools and how I run my app with the built image

FROM registry.local/mde/my-flink-app:0.0.1
COPY s3/certs/public.crt $FLINK_HOME/s3-e2e-public.crt
RUN keytool \
-noprompt \
-alias s3-e2e-public \
-importcert \
-trustcacerts \
-keystore $JAVA_HOME/lib/security/cacerts \
-storepass changeit \
-file $FLINK_HOME/s3-e2e-public.crt
$FLINK_HOME/bin/flink run-application \
-t kubernetes-application \
-Denv.java.opts="-Dkafka.brokers=kafka-external:9092 -Dkafka.schema-registry.url=kafka-schemaregistry:8081" \
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" \
-Dkubernetes.cluster-id=${K8S_CLUSTERID} \
-Dkubernetes.container.image=${DOCKER_REPO}/${ORGANISATION}/${APP_NAME}:${APP_VERSION} \
-Dkubernetes.namespace=${K8S_NAMESPACE} \
-Dkubernetes.rest-service.exposed.type=${K8S_RESTSERVICE_EXPOSED_TYPE} \
-Dkubernetes.taskmanager.cpu=${K8S_TASKMANAGER_CPU} \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dtaskmanager.memory.process.size=${TASKMANAGER_MEMORY_PROCESS_SIZE} \
-Dtaskmanager.numberOfTaskSlots=${TASKMANAGER_NUMBEROFTASKSLOTS} \
-Ds3.endpoint=
s3:443 \
-Ds3.access-key=
${S3_ACCESSKEY} \
-Ds3.secret-key=
${S3_SECRETKEY} \
-Ds3.path.style.access=true \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=s3://
${ORGANISATION}/${APP_NAME}/checkpoint \
-Dstate.savepoints.dir=s3://
${ORGANISATION}/${APP_NAME}/savepoint \
local://
${FLINK_HOME}/usrlib/${APP_NAME}-assembly-${APP_VERSION}.jar
However, I get the following error and I don't see my trusted key in keytools when I login to the pod (seems the trustedstore is not cloned)
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:305) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:224) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:483) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:269) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:242) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_265]
        ... 6 more
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on mde: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?