Hi Team,
We have a session cluster running on K8 where multiple stateless jobs are running fine. We observed that once we submit a stateful job (state size per checkpoint is 1GB) to the same session cluster other jobs are impacted because this job starts to utilise more memory and CPU and eventually terminates the pod. To mitigate this issue and provide better resource isolation we have created multiple session clusters where we will launch a high throughput (stateful) job in one cluster and club low throughput jobs in another cluster. This seems to work fine but managing this will be painful once we start to create more session cluster for high throughput jobs (10 plus jobs) as we will not have a single flink endpoint to submit the job ( as we have it in YARN where we submit directly to RM ) Can you please provide me inputs on how we should handle this better in Kubernetes Regards, Vinay Patil |
Hi Vinay Patil, You are right. Flink does not provide any isolation between different jobs in the same Flink session cluster. You could use Flink job cluster or application cluster(from 1.11) to get better isolation since a dedicated Flink cluster will be started for each job. Please refer to the standalone K8s job cluster[1] or native K8s application mode[2] for more information. If you want to get a tool for managing multiple jobs, maybe flink-k8s-operator is a good choice[3][4]. Also I am trying to build a java implemented flink-native-k8s-operator[5], please checkout if you are interested. Best, Yang Vinay Patil <[hidden email]> 于2020年7月29日周三 上午12:15写道:
|
Hi Yang, Thank you for your reply. Yes, we have evaluated job specific clusters (as we used to deploy the same in YARN) , the main issue is Job monitoring of multiple jobs as we won't be having a single endpoint like YARN does . We will evaluate K8's operator you have suggested Thanks and Regards, Vinay Patil On Wed, Jul 29, 2020 at 11:08 AM Yang Wang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |