Configuring ephemeral storage limits when using Native Kubernetes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Configuring ephemeral storage limits when using Native Kubernetes

Emilien Kenler
Hello,

I'm trying to run Flink on Kubernetes, and I recently switched from lyft/flinkk8soperator to the Flink Native Kubernetes deployment mode.

I have a long running job, that I want to deploy (using application mode), and after a few hours, I noticed the deployment was disappearing.
After a quick look at the logs, it seems that the job manager was no longer to talk with the task manager after a while, because those were evicted by Kubernetes due to using more ephemeral storage than allowed.

We have limit ranges set per namespace with low default value, and each application deployed on Kubernetes needs to set values appropriate depending on its usage.
I couldn't find a way to configure those via Flink configuration.

Is there a way to set ephemeral storage requests and limits?
Are external resources supposed to help here?
If there is currently no way to do it, should it be added to the scope of FLINK-20324 ?

Thanks,
Emilien
Reply | Threaded
Open this post in threaded view
|

Re: Configuring ephemeral storage limits when using Native Kubernetes

Yang Wang
Hi Emilien,

Thanks for trying the native Flink integration.

Unfortunately, we still do not have the ability to set the ephemeral storage limit. I think it could
be supported via pod template[1]. I am still working on this ticket and already have a draft PR[2].
I believe it could be supported in release 1.13 and could be backported to 1.12 if necessary.

You could have a following pod template to set ephemeral storage limit after FLINK-15656 is merged.

apiVersion: v1
kind: Pod
metadata:
  name: pod-template
spec:
  initContainers:
  - name: artifacts-fetcher
    image: reg.docker.alibaba-inc.com/k8s-yiqi/artifact-fetcher:latest
    imagePullPolicy: Always
    # Use wget or other tools to get user jars from remote storage
    command: ['wget', 'http://path/of/your.jar', '-O' , '/flink-artifact/myjob.jar']
    volumeMounts:
    - mountPath: /flink-artifact
      name: flink-artifact
  containers:
    # Do not change the main container name
  - name: flink-job-manager
    volumeMounts:
    - mountPath: /opt/flink/usrlib
      name: flink-artifact
    - mountPath: /opt/flink/log
      name: flink-logs
  volumes:
  - name: flink-artifact
    emptyDir:
      sizeLimit: "1Gi"
  - name: flink-logs
    emptyDir:
      sizeLimit: "1Gi"



Best,
Yang

Emilien Kenler <[hidden email]> 于2021年1月29日周五 上午8:14写道:
Hello,

I'm trying to run Flink on Kubernetes, and I recently switched from lyft/flinkk8soperator to the Flink Native Kubernetes deployment mode.

I have a long running job, that I want to deploy (using application mode), and after a few hours, I noticed the deployment was disappearing.
After a quick look at the logs, it seems that the job manager was no longer to talk with the task manager after a while, because those were evicted by Kubernetes due to using more ephemeral storage than allowed.

We have limit ranges set per namespace with low default value, and each application deployed on Kubernetes needs to set values appropriate depending on its usage.
I couldn't find a way to configure those via Flink configuration.

Is there a way to set ephemeral storage requests and limits?
Are external resources supposed to help here?
If there is currently no way to do it, should it be added to the scope of FLINK-20324 ?

Thanks,
Emilien
Reply | Threaded
Open this post in threaded view
|

Re: Configuring ephemeral storage limits when using Native Kubernetes

Emilien Kenler
Hello,

I think this would solve our problem.
We are also looking at supporting affinity rules, and it would also cover it.

I'm going to try to find some time this week to try your patch.

Thanks

From: Yang Wang <[hidden email]>
Sent: Friday, January 29, 2021 6:20 PM
To: Emilien Kenler <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Configuring ephemeral storage limits when using Native Kubernetes
 
Hi Emilien,

Thanks for trying the native Flink integration.

Unfortunately, we still do not have the ability to set the ephemeral storage limit. I think it could
be supported via pod template[1]. I am still working on this ticket and already have a draft PR[2].
I believe it could be supported in release 1.13 and could be backported to 1.12 if necessary.

You could have a following pod template to set ephemeral storage limit after FLINK-15656 is merged.

apiVersion: v1
kind: Pod
metadata:
  name: pod-template
spec:
  initContainers:
  - name: artifacts-fetcher
    image: reg.docker.alibaba-inc.com/k8s-yiqi/artifact-fetcher:latest
    imagePullPolicy: Always
    # Use wget or other tools to get user jars from remote storage
    command: ['wget', 'http://path/of/your.jar', '-O' , '/flink-artifact/myjob.jar']
    volumeMounts:
    - mountPath: /flink-artifact
      name: flink-artifact
  containers:
    # Do not change the main container name
  - name: flink-job-manager
    volumeMounts:
    - mountPath: /opt/flink/usrlib
      name: flink-artifact
    - mountPath: /opt/flink/log
      name: flink-logs
  volumes:
  - name: flink-artifact
    emptyDir:
      sizeLimit: "1Gi"
  - name: flink-logs
    emptyDir:
      sizeLimit: "1Gi"



Best,
Yang

Emilien Kenler <[hidden email]> 于2021年1月29日周五 上午8:14写道:
Hello,

I'm trying to run Flink on Kubernetes, and I recently switched from lyft/flinkk8soperator to the Flink Native Kubernetes deployment mode.

I have a long running job, that I want to deploy (using application mode), and after a few hours, I noticed the deployment was disappearing.
After a quick look at the logs, it seems that the job manager was no longer to talk with the task manager after a while, because those were evicted by Kubernetes due to using more ephemeral storage than allowed.

We have limit ranges set per namespace with low default value, and each application deployed on Kubernetes needs to set values appropriate depending on its usage.
I couldn't find a way to configure those via Flink configuration.

Is there a way to set ephemeral storage requests and limits?
Are external resources supposed to help here?
If there is currently no way to do it, should it be added to the scope of FLINK-20324 ?

Thanks,
Emilien
Reply | Threaded
Open this post in threaded view
|

Re: Configuring ephemeral storage limits when using Native Kubernetes

Yang Wang
Thanks for testing the pod template. I really hope to get more feedbacks from your use case.

Best,
Yang

Emilien Kenler <[hidden email]> 于2021年2月1日周一 上午9:45写道:
Hello,

I think this would solve our problem.
We are also looking at supporting affinity rules, and it would also cover it.

I'm going to try to find some time this week to try your patch.

Thanks

From: Yang Wang <[hidden email]>
Sent: Friday, January 29, 2021 6:20 PM
To: Emilien Kenler <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Configuring ephemeral storage limits when using Native Kubernetes
 
Hi Emilien,

Thanks for trying the native Flink integration.

Unfortunately, we still do not have the ability to set the ephemeral storage limit. I think it could
be supported via pod template[1]. I am still working on this ticket and already have a draft PR[2].
I believe it could be supported in release 1.13 and could be backported to 1.12 if necessary.

You could have a following pod template to set ephemeral storage limit after FLINK-15656 is merged.

apiVersion: v1
kind: Pod
metadata:
  name: pod-template
spec:
  initContainers:
  - name: artifacts-fetcher
    image: reg.docker.alibaba-inc.com/k8s-yiqi/artifact-fetcher:latest
    imagePullPolicy: Always
    # Use wget or other tools to get user jars from remote storage
    command: ['wget', 'http://path/of/your.jar', '-O' , '/flink-artifact/myjob.jar']
    volumeMounts:
    - mountPath: /flink-artifact
      name: flink-artifact
  containers:
    # Do not change the main container name
  - name: flink-job-manager
    volumeMounts:
    - mountPath: /opt/flink/usrlib
      name: flink-artifact
    - mountPath: /opt/flink/log
      name: flink-logs
  volumes:
  - name: flink-artifact
    emptyDir:
      sizeLimit: "1Gi"
  - name: flink-logs
    emptyDir:
      sizeLimit: "1Gi"



Best,
Yang

Emilien Kenler <[hidden email]> 于2021年1月29日周五 上午8:14写道:
Hello,

I'm trying to run Flink on Kubernetes, and I recently switched from lyft/flinkk8soperator to the Flink Native Kubernetes deployment mode.

I have a long running job, that I want to deploy (using application mode), and after a few hours, I noticed the deployment was disappearing.
After a quick look at the logs, it seems that the job manager was no longer to talk with the task manager after a while, because those were evicted by Kubernetes due to using more ephemeral storage than allowed.

We have limit ranges set per namespace with low default value, and each application deployed on Kubernetes needs to set values appropriate depending on its usage.
I couldn't find a way to configure those via Flink configuration.

Is there a way to set ephemeral storage requests and limits?
Are external resources supposed to help here?
If there is currently no way to do it, should it be added to the scope of FLINK-20324 ?

Thanks,
Emilien