(DEPRECATED) Apache Flink User Mailing List archive.

Flink Deployment on Kubernetes session Cluster

Classic

List

Threaded

3 messages Options

Vinay Patil

Flink Deployment on Kubernetes session Cluster

Hi Team,

We have a session cluster running on K8 where multiple stateless jobs are running fine. We observed that once we submit a stateful job (state size per checkpoint is 1GB) to the same session cluster other jobs are impacted because this job starts to utilise more memory and CPU and eventually terminates the pod.

To mitigate this issue and provide better resource isolation we have created multiple session clusters where we will launch a high throughput (stateful) job in one cluster and club low throughput jobs in another cluster.

This seems to work fine but managing this will be painful once we start to create more session cluster for high throughput jobs (10 plus jobs) as we will not have a single flink endpoint to submit the job ( as we have it in YARN where we submit directly to RM )

Can you please provide me inputs on how we should handle this better in Kubernetes

Regards,

Vinay Patil

Yang Wang

Re: Flink Deployment on Kubernetes session Cluster

Hi Vinay Patil,

You are right. Flink does not provide any isolation between different jobs in the same Flink session cluster.

You could use Flink job cluster or application cluster(from 1.11) to get better isolation since a dedicated Flink

cluster will be started for each job.

Please refer to the standalone K8s job cluster[1] or native K8s application mode[2] for more information.

If you want to get a tool for managing multiple jobs, maybe flink-k8s-operator is a good choice[3][4].

Also I am trying to build a java implemented flink-native-k8s-operator[5], please checkout if you are interested.

[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#deploy-job-cluster

[2]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#flink-kubernetes-application

[3]. https://github.com/lyft/flinkk8soperator

[4]. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator

[5]. https://github.com/wangyang0918/flink-native-k8s-operator

Best,

Yang

Vinay Patil <[hidden email]> 于2020年7月29日周三上午12:15写道：

Hi Team,

We have a session cluster running on K8 where multiple stateless jobs are running fine. We observed that once we submit a stateful job (state size per checkpoint is 1GB) to the same session cluster other jobs are impacted because this job starts to utilise more memory and CPU and eventually terminates the pod.

To mitigate this issue and provide better resource isolation we have created multiple session clusters where we will launch a high throughput (stateful) job in one cluster and club low throughput jobs in another cluster.
This seems to work fine but managing this will be painful once we start to create more session cluster for high throughput jobs (10 plus jobs) as we will not have a single flink endpoint to submit the job ( as we have it in YARN where we submit directly to RM )

Can you please provide me inputs on how we should handle this better in Kubernetes

Regards,
Vinay Patil

Vinay Patil

Re: Flink Deployment on Kubernetes session Cluster

Hi Yang,

Thank you for your reply.

Yes, we have evaluated job specific clusters (as we used to deploy the same in YARN) , the main issue is Job monitoring of multiple jobs as we won't be having a single endpoint like YARN does . We will evaluate K8's operator you have suggested

Thanks and Regards,

Vinay Patil

On Wed, Jul 29, 2020 at 11:08 AM Yang Wang <[hidden email]> wrote:

Hi Vinay Patil,

You are right. Flink does not provide any isolation between different jobs in the same Flink session cluster.
You could use Flink job cluster or application cluster(from 1.11) to get better isolation since a dedicated Flink
cluster will be started for each job.

Please refer to the standalone K8s job cluster[1] or native K8s application mode[2] for more information.

If you want to get a tool for managing multiple jobs, maybe flink-k8s-operator is a good choice[3][4].
Also I am trying to build a java implemented flink-native-k8s-operator[5], please checkout if you are interested.

[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#deploy-job-cluster
[2]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#flink-kubernetes-application
[3]. https://github.com/lyft/flinkk8soperator
[4]. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
[5]. https://github.com/wangyang0918/flink-native-k8s-operator

Best,
Yang

Vinay Patil <[hidden email]> 于2020年7月29日周三上午12:15写道：
Hi Team,

We have a session cluster running on K8 where multiple stateless jobs are running fine. We observed that once we submit a stateful job (state size per checkpoint is 1GB) to the same session cluster other jobs are impacted because this job starts to utilise more memory and CPU and eventually terminates the pod.

To mitigate this issue and provide better resource isolation we have created multiple session clusters where we will launch a high throughput (stateful) job in one cluster and club low throughput jobs in another cluster.
This seems to work fine but managing this will be painful once we start to create more session cluster for high throughput jobs (10 plus jobs) as we will not have a single flink endpoint to submit the job ( as we have it in YARN where we submit directly to RM )

Can you please provide me inputs on how we should handle this better in Kubernetes

Regards,
Vinay Patil