Using ClusterIP with KubernetesHAServicesFactory

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Using ClusterIP with KubernetesHAServicesFactory

Kevin Kwon
Hi team, I have some concerns using ClusterIP with Kubernetes Native Deployment with KubernetesHAServiceFactory for High Availability

It seems that the KubernetesHAServicesFactory taps on the Service of the the Flink K8S Native Cluster to access the JobManager's availability, although I have some company-wise policy where Services shouldn't expose NodePorts unless it's an exceptional case. How do I make the KubernetesHAServicesFactory reach the cluster through ClusterIP?

I get the following error when running with ClusterIP

java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:122)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:151)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)

        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)

Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        ... 6 more

Caused by: java.net.UnknownHostException: scrat-session-rest.scrat: Name or service not known

        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)

        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)

        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)

        at java.net.InetAddress.getAllByName(InetAddress.java:1193)

        at java.net.InetAddress.getAllByName(InetAddress.java:1127)

        at java.net.InetAddress.getByName(InetAddress.java:1077)

        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:204)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:116)

        ... 5 more

Reply | Threaded
Open this post in threaded view
|

Re: Using ClusterIP with KubernetesHAServicesFactory

Kevin Kwon
Ok it seems that this check is ran by the K8S CLI which in my case runs in a CICD cluster

If this check should happen, I'd like to override this value with the ingress address

Is there a way I can override the rest address that the K8S CLI taps on?

On Fri, Jan 15, 2021 at 7:55 PM Kevin Kwon <[hidden email]> wrote:
Hi team, I have some concerns using ClusterIP with Kubernetes Native Deployment with KubernetesHAServiceFactory for High Availability

It seems that the KubernetesHAServicesFactory taps on the Service of the the Flink K8S Native Cluster to access the JobManager's availability, although I have some company-wise policy where Services shouldn't expose NodePorts unless it's an exceptional case. How do I make the KubernetesHAServicesFactory reach the cluster through ClusterIP?

I get the following error when running with ClusterIP

java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:122)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:151)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)

        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)

Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        ... 6 more

Caused by: java.net.UnknownHostException: scrat-session-rest.scrat: Name or service not known

        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)

        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)

        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)

        at java.net.InetAddress.getAllByName(InetAddress.java:1193)

        at java.net.InetAddress.getAllByName(InetAddress.java:1127)

        at java.net.InetAddress.getByName(InetAddress.java:1077)

        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:204)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:116)

        ... 5 more

Reply | Threaded
Open this post in threaded view
|

Re: Using ClusterIP with KubernetesHAServicesFactory

Yang Wang
1. Why do you get the UnknownHostException when the service exposed type is ClusterIP?
The root cause is that ClusterIP is meant to be accessed only in the K8s cluster. So you will get
the UnknownHostException out of the K8s cluster. We already have a ticket here[1] and will try 
to improve the behavior.

2. Is there a way I can override the rest address that the K8S CLI taps on?
Currently, you could not override the rest endpoint manually. Since Flink client will always override
the rest endpoint based on the service exposed type. 

IIUC, your CICD cluster is not built and running on the K8s cluster, that is why you have such an issue.
Once FLINK-20944 is resolved, the UnknownHostException will disappear. However, you still have
a limitation. You could not use "flink cancel/list/savepoint" to interact with the Flink cluster. Because
the network is not reachable. But you could do it via rest API if you have configured the ingress.




Best,
Yang

Kevin Kwon <[hidden email]> 于2021年1月18日周一 上午2:52写道:
Ok it seems that this check is ran by the K8S CLI which in my case runs in a CICD cluster

If this check should happen, I'd like to override this value with the ingress address

Is there a way I can override the rest address that the K8S CLI taps on?

On Fri, Jan 15, 2021 at 7:55 PM Kevin Kwon <[hidden email]> wrote:
Hi team, I have some concerns using ClusterIP with Kubernetes Native Deployment with KubernetesHAServiceFactory for High Availability

It seems that the KubernetesHAServicesFactory taps on the Service of the the Flink K8S Native Cluster to access the JobManager's availability, although I have some company-wise policy where Services shouldn't expose NodePorts unless it's an exceptional case. How do I make the KubernetesHAServicesFactory reach the cluster through ClusterIP?

I get the following error when running with ClusterIP

java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:122)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:151)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)

        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)

Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        ... 6 more

Caused by: java.net.UnknownHostException: scrat-session-rest.scrat: Name or service not known

        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)

        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)

        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)

        at java.net.InetAddress.getAllByName(InetAddress.java:1193)

        at java.net.InetAddress.getAllByName(InetAddress.java:1127)

        at java.net.InetAddress.getByName(InetAddress.java:1077)

        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:204)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:116)

        ... 5 more

Reply | Threaded
Open this post in threaded view
|

Re: Using ClusterIP with KubernetesHAServicesFactory

Kevin Kwon
Thanks Yang, I'll track the ticket as well from now on.

On Mon, Jan 18, 2021 at 9:56 AM Yang Wang <[hidden email]> wrote:
1. Why do you get the UnknownHostException when the service exposed type is ClusterIP?
The root cause is that ClusterIP is meant to be accessed only in the K8s cluster. So you will get
the UnknownHostException out of the K8s cluster. We already have a ticket here[1] and will try 
to improve the behavior.

2. Is there a way I can override the rest address that the K8S CLI taps on?
Currently, you could not override the rest endpoint manually. Since Flink client will always override
the rest endpoint based on the service exposed type. 

IIUC, your CICD cluster is not built and running on the K8s cluster, that is why you have such an issue.
Once FLINK-20944 is resolved, the UnknownHostException will disappear. However, you still have
a limitation. You could not use "flink cancel/list/savepoint" to interact with the Flink cluster. Because
the network is not reachable. But you could do it via rest API if you have configured the ingress.




Best,
Yang

Kevin Kwon <[hidden email]> 于2021年1月18日周一 上午2:52写道:
Ok it seems that this check is ran by the K8S CLI which in my case runs in a CICD cluster

If this check should happen, I'd like to override this value with the ingress address

Is there a way I can override the rest address that the K8S CLI taps on?

On Fri, Jan 15, 2021 at 7:55 PM Kevin Kwon <[hidden email]> wrote:
Hi team, I have some concerns using ClusterIP with Kubernetes Native Deployment with KubernetesHAServiceFactory for High Availability

It seems that the KubernetesHAServicesFactory taps on the Service of the the Flink K8S Native Cluster to access the JobManager's availability, although I have some company-wise policy where Services shouldn't expose NodePorts unless it's an exceptional case. How do I make the KubernetesHAServicesFactory reach the cluster through ClusterIP?

I get the following error when running with ClusterIP

java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:122)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:151)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)

        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)

        at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)

Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not create the RestClusterClient.

        ... 6 more

Caused by: java.net.UnknownHostException: scrat-session-rest.scrat: Name or service not known

        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)

        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)

        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)

        at java.net.InetAddress.getAllByName(InetAddress.java:1193)

        at java.net.InetAddress.getAllByName(InetAddress.java:1127)

        at java.net.InetAddress.getByName(InetAddress.java:1077)

        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:204)

        at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:116)

        ... 5 more