Native K8S not creating TMs

classic Classic list List threaded Threaded
10 messages Options
kb
Reply | Threaded
Open this post in threaded view
|

Native K8S not creating TMs

kb
Hi

We are using 1.10.1 with native k8s and while the service appears to be
created and I can submit a job & see it via Web UI, TMs/pods are never
created thus the jobs never start.

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources.

Is there somewhere I could see the pod creation logs?

thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yangze Guo
Hi, Kevin,

Regarding logs, you could follow this guide [1].

BTW, you could execute "kubectl get pod" to get the current pods. If
there is something like "flink-taskmanager-1-1", you could execute
"kubectl describe pod flink-taskmanager-1-1" to see the status of it.

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:

>
> Hi
>
> We are using 1.10.1 with native k8s and while the service appears to be
> created and I can submit a job & see it via Web UI, TMs/pods are never
> created thus the jobs never start.
>
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources.
>
> Is there somewhere I could see the pod creation logs?
>
> thanks
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yangze Guo
Amend: for release 1.10.1, please refer to this guide [1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 9:52 AM Yangze Guo <[hidden email]> wrote:

>
> Hi, Kevin,
>
> Regarding logs, you could follow this guide [1].
>
> BTW, you could execute "kubectl get pod" to get the current pods. If
> there is something like "flink-taskmanager-1-1", you could execute
> "kubectl describe pod flink-taskmanager-1-1" to see the status of it.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>
> Best,
> Yangze Guo
>
> On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:
> >
> > Hi
> >
> > We are using 1.10.1 with native k8s and while the service appears to be
> > created and I can submit a job & see it via Web UI, TMs/pods are never
> > created thus the jobs never start.
> >
> > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> > Could not allocate the required slot within slot request timeout. Please
> > make sure that the cluster has enough resources.
> >
> > Is there somewhere I could see the pod creation logs?
> >
> > thanks
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yang Wang
I second Yangze's suggestion. You need to get the jobmanager log first. Then
it will be easier to find the root cause. I know that it is not convenient for users
to access the log via kubectl and we already have a ticket for this[1].

Usually, the reason that Flink resourcemanager could not allocate taskmanagers
from K8s is the service account not configured correctly. You could checkout the
RBAC configuration here[2].




Best,
Yang

Yangze Guo <[hidden email]> 于2020年6月4日周四 上午10:01写道:
Amend: for release 1.10.1, please refer to this guide [1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#log-files

Best,
Yangze Guo

On Thu, Jun 4, 2020 at 9:52 AM Yangze Guo <[hidden email]> wrote:
>
> Hi, Kevin,
>
> Regarding logs, you could follow this guide [1].
>
> BTW, you could execute "kubectl get pod" to get the current pods. If
> there is something like "flink-taskmanager-1-1", you could execute
> "kubectl describe pod flink-taskmanager-1-1" to see the status of it.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>
> Best,
> Yangze Guo
>
> On Thu, Jun 4, 2020 at 2:28 AM kb <[hidden email]> wrote:
> >
> > Hi
> >
> > We are using 1.10.1 with native k8s and while the service appears to be
> > created and I can submit a job & see it via Web UI, TMs/pods are never
> > created thus the jobs never start.
> >
> > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> > Could not allocate the required slot within slot request timeout. Please
> > make sure that the cluster has enough resources.
> >
> > Is there somewhere I could see the pod creation logs?
> >
> > thanks
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
kb
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

kb
Thanks!

I do not see any pods of the form `flink-taskmanager-1-1`, so I tried the
exec suggestion.
The logs are attached below. Is there a quick RBAC check I could perform? I
followed the command on the docs page linked (kubectl create
clusterrolebinding flink-role-binding-default --clusterrole=edit
--serviceaccount=default:default).

2020-06-04 15:34:04,711 INFO
org.apache.flink.kubernetes.KubernetesResourceManager         - Requesting
new TaskManager pod with <1728,1.0>. Number pending requests 1.
2020-06-04 15:34:04,712 INFO
org.apache.flink.kubernetes.KubernetesResourceManager         - TaskManager
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994 will
be started with TaskExecutorProcessSpec {cpuCores=1.0,
frameworkHeapSize=128.000mb (134217728 bytes),
frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=384.000mb
(402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb
(134217730 bytes), managedMemorySize=512.000mb (536870920 bytes),
jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb
(201326592 bytes)}.
2020-06-04 15:34:14,713 ERROR
org.apache.flink.kubernetes.KubernetesResourceManager         - Could not
start TaskManager in pod
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994.
java.util.concurrent.CompletionException:
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]
for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
        at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException:
Operation: [create]  for kind: [Pod]  with name: [null]  in namespace:
[default]  failed.
        at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
        at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:331)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324)
        at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$0(Fabric8FlinkKubeClient.java:184)
        at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        ... 3 more
Caused by: java.net.SocketTimeoutException: timeout
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:656)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:664)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:153)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
        at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.execute(RealCall.java:92)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328)
        ... 6 more



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yang Wang
If you have created the role binding "flink-role-binding-default" successfully,
then it should not be the RBAC issue.

It seems that kubernetes-client in JobManager pod could not contact to
K8s apiserver due to okhttp issue with java 8u252. Could you add the following
config option to disable http2? You could find more information here[1].

kubernetes-session.sh ... -Dcontainerized.master.env.HTTP2_DISABLE=true




Best,
Yang

kb <[hidden email]> 于2020年6月4日周四 下午11:40写道:
Thanks!

I do not see any pods of the form `flink-taskmanager-1-1`, so I tried the
exec suggestion.
The logs are attached below. Is there a quick RBAC check I could perform? I
followed the command on the docs page linked (kubectl create
clusterrolebinding flink-role-binding-default --clusterrole=edit
--serviceaccount=default:default).

2020-06-04 15:34:04,711 INFO
org.apache.flink.kubernetes.KubernetesResourceManager         - Requesting
new TaskManager pod with <1728,1.0>. Number pending requests 1.
2020-06-04 15:34:04,712 INFO
org.apache.flink.kubernetes.KubernetesResourceManager         - TaskManager
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994 will
be started with TaskExecutorProcessSpec {cpuCores=1.0,
frameworkHeapSize=128.000mb (134217728 bytes),
frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=384.000mb
(402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb
(134217730 bytes), managedMemorySize=512.000mb (536870920 bytes),
jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb
(201326592 bytes)}.
2020-06-04 15:34:14,713 ERROR
org.apache.flink.kubernetes.KubernetesResourceManager         - Could not
start TaskManager in pod
flink-cluster-e07a6f7a-8bd1-4306-89f1-a1ff7ea17bf6-taskmanager-1-5994.
java.util.concurrent.CompletionException:
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]
for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
        at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException:
Operation: [create]  for kind: [Pod]  with name: [null]  in namespace:
[default]  failed.
        at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
        at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:331)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324)
        at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$0(Fabric8FlinkKubeClient.java:184)
        at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        ... 3 more
Caused by: java.net.SocketTimeoutException: timeout
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:656)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:664)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:153)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at
org.apache.flink.kubernetes.shadded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
        at
org.apache.flink.kubernetes.shadded.okhttp3.RealCall.execute(RealCall.java:92)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
        at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798)
        at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328)
        ... 6 more



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
kb
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

kb
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create]  for kind: [Pod]  with name: [null]  in namespace: [default]
failed.

Best,
kevin



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yang Wang
Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod
name could not be too long. I notice that you are using the client automatic generated
cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink
session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot
to find the root cause.




Best,
Yang

kb <[hidden email]> 于2020年6月6日周六 上午1:00写道:
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create]  for kind: [Pod]  with name: [null]  in namespace: [default]
failed.

Best,
kevin



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yang Wang
Hi Kevin,

Sorry for not notice your last response.
Could you share you full DEBUG level jobmanager logs? I will try to figure out
whether it is a issue of Flink or K8s. Because i could not reproduce your situation
with my local K8s cluster.


Best,
Yang

Yang Wang <[hidden email]> 于2020年6月8日周一 上午11:02写道:
Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod
name could not be too long. I notice that you are using the client automatic generated
cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink
session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot
to find the root cause.




Best,
Yang

kb <[hidden email]> 于2020年6月6日周六 上午1:00写道:
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create]  for kind: [Pod]  with name: [null]  in namespace: [default]
failed.

Best,
kevin



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Native K8S not creating TMs

Yang Wang
Thanks for sharing the DEBUG level log.

I carefully check the logs and find that the kubernetes-client discovered the
api server address and token successfully.  However, it could not contact with
api server(10.100.0.1:443). Could you check whether you api server is configured
to allow accessing within cluster.

I think you could start any pod and tunnel in to run the following command.
KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)    
wget -vO- --ca-certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt  --header "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api
BTW, what's your kubernetes version? And i am not sure whether increasing the timeout
could help.

-Dcontainerized.master.env.KUBERNETES_REQUEST_TIMEOUT=60000 -Dcontainerized.master.env.KUBERNETES_CONNECTION_TIMEOUT=60000


Best,
Yang


Yang Wang <[hidden email]> 于2020年6月16日周二 下午12:00写道:
Hi Kevin,

Sorry for not notice your last response.
Could you share you full DEBUG level jobmanager logs? I will try to figure out
whether it is a issue of Flink or K8s. Because i could not reproduce your situation
with my local K8s cluster.


Best,
Yang

Yang Wang <[hidden email]> 于2020年6月8日周一 上午11:02写道:
Hi Kevin,

It may because the characters length limitation of K8s(no more than 63)[1]. So the pod
name could not be too long. I notice that you are using the client automatic generated
cluster-id. It may cause problem and could you set a meaningful cluster-id for your Flink
session? For example,

kubernetes-session.sh ... -Dkubernetes.cluster-id=my-flink-k8s-session

This behavior has been improved in Flink 1.11 to check the length in client side before submission.

If it still could not work, could you share your full command and jobmanager logs? It will help a lot
to find the root cause.




Best,
Yang

kb <[hidden email]> 于2020年6月6日周六 上午1:00写道:
Thanks Yang for the suggestion, I have tried it and I'm still getting the
same exception. Is it possible its due to the null pod name? Operation:
[create]  for kind: [Pod]  with name: [null]  in namespace: [default]
failed.

Best,
kevin



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/