(DEPRECATED) Apache Flink User Mailing List archive.

Native K8S not creating TMs

Classic

List

Threaded

10 messages Options

Native K8S not creating TMs

Hi

We are using 1.10.1 with native k8s and while the service appears to be
created and I can submit a job & see it via Web UI, TMs/pods are never
created thus the jobs never start.

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources.

Is there somewhere I could see the pod creation logs?

thanks

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yangze Guo

Re: Native K8S not creating TMs

Hi, Kevin,

Regarding logs, you could follow this guide [1].

BTW, you could execute "kubectl get pod" to get the current pods. If
there is something like "flink-taskmanager-1-1", you could execute
"kubectl describe pod flink-taskmanager-1-1" to see the status of it.

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:

>
> Hi
>
> We are using 1.10.1 with native k8s and while the service appears to be
> created and I can submit a job & see it via Web UI, TMs/pods are never
> created thus the jobs never start.
>
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources.
>
> Is there somewhere I could see the pod creation logs?
>
> thanks
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yangze Guo

Re: Native K8S not creating TMs

Amend: for release 1.10.1, please refer to this guide [1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 9:52 AM Yangze Guo <[hidden email]> wrote:

>
> Hi, Kevin,
>
> Regarding logs, you could follow this guide [1].
>
> BTW, you could execute "kubectl get pod" to get the current pods. If
> there is something like "flink-taskmanager-1-1", you could execute
> "kubectl describe pod flink-taskmanager-1-1" to see the status of it.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>
> Best,
> Yangze Guo
>
> On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:
> >
> > Hi
> >
> > We are using 1.10.1 with native k8s and while the service appears to be
> > created and I can submit a job & see it via Web UI, TMs/pods are never
> > created thus the jobs never start.
> >
> > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> > Could not allocate the required slot within slot request timeout. Please
> > make sure that the cluster has enough resources.
> >
> > Is there somewhere I could see the pod creation logs?
> >
> > thanks
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yang Wang

Re: Native K8S not creating TMs

I second Yangze's suggestion. You need to get the jobmanager log first. Then

it will be easier to find the root cause. I know that it is not convenient for users

to access the log via kubectl and we already have a ticket for this[1].

Usually, the reason that Flink resourcemanager could not allocate taskmanagers

from K8s is the service account not configured correctly. You could checkout the

RBAC configuration here[2].

[1]. https://issues.apache.org/jira/browse/FLINK-15792

[2]. https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#rbac

Best,

Yang

Yangze Guo <[hidden email]> 于2020年6月4日周四上午10:01写道：

Amend: for release 1.10.1, please refer to this guide [1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 9:52 AM Yangze Guo <[hidden email]> wrote:
>
> Hi, Kevin,
>
> Regarding logs, you could follow this guide [1].
>
> BTW, you could execute "kubectl get pod" to get the current pods. If
> there is something like "flink-taskmanager-1-1", you could execute
> "kubectl describe pod flink-taskmanager-1-1" to see the status of it.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>
> Best,
> Yangze Guo
>
> On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:
> >
> > Hi
> >
> > We are using 1.10.1 with native k8s and while the service appears to be
> > created and I can submit a job & see it via Web UI, TMs/pods are never
> > created thus the jobs never start.
> >
> > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> > Could not allocate the required slot within slot request timeout. Please
> > make sure that the cluster has enough resources.
> >
> > Is there somewhere I could see the pod creation logs?
> >
> > thanks
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Native K8S not creating TMs

Thanks!

I do not see any pods of the form `flink-taskmanager-1-1`, so I tried the
exec suggestion.
The logs are attached below. Is there a quick RBAC check I could perform? I
followed the command on the docs page linked (kubectl create
clusterrolebinding flink-role-binding-default --clusterrole=edit
--serviceaccount=default:default).

2020-06-04 15:34:04,711 INFO
org.apache.flink.kubernetes.KubernetesResourceManager - Requesting
new TaskManager pod with <1728,1.0>. Number pending requests 1.
2020-06-04 15:34:04,712 INFO
org.apache.flink.kubernetes.KubernetesResourceManager - TaskManager
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994 will
be started with TaskExecutorProcessSpec {cpuCores=1.0,
frameworkHeapSize=128.000mb (134217728 bytes),
frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=384.000mb
(402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb
(134217730 bytes), managedMemorySize=512.000mb (536870920 bytes),
jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb
(201326592 bytes)}.
2020-06-04 15:34:14,713 ERROR
org.apache.flink.kubernetes.KubernetesResourceManager - Could not
start TaskManager in pod
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994.
java.util.concurrent.CompletionException:
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]
for kind: [Pod] with name: [null] in namespace: [default] failed.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException:
Operation: [create] for kind: [Pod] with name: [null] in namespace:
[default] failed.
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:331)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324)
at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$0(Fabric8FlinkKubeClient.java:184)
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
... 3 more
Caused by: java.net.SocketTimeoutException: timeout
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:656)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:664)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:153)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.execute(RealCall.java:92)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328)
... 6 more

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yang Wang

Re: Native K8S not creating TMs

If you have created the role binding "flink-role-binding-default" successfully,

then it should not be the RBAC issue.

It seems that kubernetes-client in JobManager pod could not contact to

K8s apiserver due to okhttp issue with java 8u252. Could you add the following

config option to disable http2? You could find more information here[1].

kubernetes-session.sh ... -Dcontainerized.master.env.HTTP2_DISABLE=true

[1]. https://github.com/fabric8io/kubernetes-client/issues/2212

Best,

Yang

kb <[hidden email]> 于2020年6月4日周四下午11:40写道：

Thanks!

I do not see any pods of the form `flink-taskmanager-1-1`, so I tried the
exec suggestion.
The logs are attached below. Is there a quick RBAC check I could perform? I
followed the command on the docs page linked (kubectl create
clusterrolebinding flink-role-binding-default --clusterrole=edit
--serviceaccount=default:default).

2020-06-04 15:34:04,711 INFO
org.apache.flink.kubernetes.KubernetesResourceManager - Requesting
new TaskManager pod with <1728,1.0>. Number pending requests 1.
2020-06-04 15:34:04,712 INFO
org.apache.flink.kubernetes.KubernetesResourceManager - TaskManager
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994 will
be started with TaskExecutorProcessSpec {cpuCores=1.0,
frameworkHeapSize=128.000mb (134217728 bytes),
frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=384.000mb
(402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb
(134217730 bytes), managedMemorySize=512.000mb (536870920 bytes),
jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb
(201326592 bytes)}.
2020-06-04 15:34:14,713 ERROR
org.apache.flink.kubernetes.KubernetesResourceManager - Could not
start TaskManager in pod
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994.
java.util.concurrent.CompletionException:
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]
for kind: [Pod] with name: [null] in namespace: [default] failed.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException:
Operation: [create] for kind: [Pod] with name: [null] in namespace:
[default] failed.
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:331)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324)
at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$0(Fabric8FlinkKubeClient.java:184)
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
... 3 more
Caused by: java.net.SocketTimeoutException: timeout
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:656)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:664)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:153)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.execute(RealCall.java:92)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328)
... 6 more

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Native K8S not creating TMs

Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create] for kind: [Pod] with name: [null] in namespace: [default]
failed.

Best,
kevin

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yang Wang

Re: Native K8S not creating TMs

Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod

name could not be too long. I notice that you are using the client automatic generated

cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink

session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot

to find the root cause.

[1]. https://stackoverflow.com/questions/50412837/kubernetes-label-name-63-character-limit

Best,

Yang

kb <[hidden email]> 于2020年6月6日周六上午1:00写道：

Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create] for kind: [Pod] with name: [null] in namespace: [default]
failed.

Best,
kevin

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yang Wang

Re: Native K8S not creating TMs

Hi Kevin,

Sorry for not notice your last response.

Could you share you full DEBUG level jobmanager logs? I will try to figure out

whether it is a issue of Flink or K8s. Because i could not reproduce your situation

with my local K8s cluster.

Best,

Yang

Yang Wang <[hidden email]> 于2020年6月8日周一上午11:02写道：

Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod
name could not be too long. I notice that you are using the client automatic generated
cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink
session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot
to find the root cause.

[1]. https://stackoverflow.com/questions/50412837/kubernetes-label-name-63-character-limit

Best,
Yang

kb <[hidden email]> 于2020年6月6日周六上午1:00写道：
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create] for kind: [Pod] with name: [null] in namespace: [default]
failed.

Best,
kevin

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yang Wang

Re: Native K8S not creating TMs

Thanks for sharing the DEBUG level log.

I carefully check the logs and find that the kubernetes-client discovered the

api server address and token successfully. However, it could not contact with

api server(10.100.0.1:443). Could you check whether you api server is configured

to allow accessing within cluster.

I think you could start any pod and tunnel in to run the following command.

KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)    
wget -vO- --ca-certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt  --header "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api

BTW, what's your kubernetes version? And i am not sure whether increasing the timeout

could help.

-Dcontainerized.master.env.KUBERNETES_REQUEST_TIMEOUT=60000 -Dcontainerized.master.env.KUBERNETES_CONNECTION_TIMEOUT=60000

Best,

Yang

Yang Wang <[hidden email]> 于2020年6月16日周二下午12:00写道：

Hi Kevin,

Sorry for not notice your last response.
Could you share you full DEBUG level jobmanager logs? I will try to figure out
whether it is a issue of Flink or K8s. Because i could not reproduce your situation
with my local K8s cluster.

Best,
Yang

Yang Wang <[hidden email]> 于2020年6月8日周一上午11:02写道：
Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod
name could not be too long. I notice that you are using the client automatic generated
cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink
session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot
to find the root cause.

[1]. https://stackoverflow.com/questions/50412837/kubernetes-label-name-63-character-limit

Best,
Yang

kb <[hidden email]> 于2020年6月6日周六上午1:00写道：
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create] for kind: [Pod] with name: [null] in namespace: [default]
failed.

Best,
kevin

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/