Session mode on Kubernetes and # of TMs

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Session mode on Kubernetes and # of TMs

Youngwoo Kim (김영우)
Hi,

I have deployed a cluster with session mode on kubernetes and I can see one deployment, services and one JM. I'm trying to run a SQL query through sql client. for instance, 'INSERT INTO ... SELECT ...;'

When I run the query in cli, the Flink session is spinning up a TM for the query and then the query is running in a job. 

Now, I'm curious. How does Flink calculate the number of TMs for the query? and also, Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs.

Thanks,
Youngwoo
Reply | Threaded
Open this post in threaded view
|

Re: Session mode on Kubernetes and # of TMs

Yangze Guo
Hi, Youngwoo

In K8S session, the number of TMs depends on how many slots your job
needs and the number of slots per task managers (config key:
taskmanager.numberOfTaskSlots). In this case,

# of TM  = Ceil(total slots need / taskmanager.numberOfTaskSlots)

How many your job's topology and parallelism. For streaming SQL, the
whole job graph will locate in one slot by default. So, the number of
slots would be equal to the parallelism you set.

> Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs.

I might not fully understand your problem. Do you mean starting TMs
before submitting the job? If that is the case,
- You can try the standalone k8s mode. [1]
- Warmup the session by submitting some puppet jobs yourselves and
submit your job before those TMs idle timeout.
- In FLINK-15959, we will introduce the min number of slots of the
cluster. With this feature, you can configure how many TMs needed
before submitting the jobs.

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/

Best,
Yangze Guo

On Tue, May 11, 2021 at 12:24 PM Youngwoo Kim (김영우) <[hidden email]> wrote:

>
> Hi,
>
> I have deployed a cluster with session mode on kubernetes and I can see one deployment, services and one JM. I'm trying to run a SQL query through sql client. for instance, 'INSERT INTO ... SELECT ...;'
>
> When I run the query in cli, the Flink session is spinning up a TM for the query and then the query is running in a job.
>
> Now, I'm curious. How does Flink calculate the number of TMs for the query? and also, Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs.
>
> Thanks,
> Youngwoo
Reply | Threaded
Open this post in threaded view
|

Re: Session mode on Kubernetes and # of TMs

Youngwoo Kim (김영우)
Hi Yangze,

Thank you. That's what I'm looking for.

And FLINK-15959 is similar to my given requirements. I hope to run queries 'interactively' but if there are insufficient slots, time for spinning up new TMs is needed.

Thanks,
Youngwoo

On Tue, May 11, 2021 at 3:36 PM Yangze Guo <[hidden email]> wrote:
Hi, Youngwoo

In K8S session, the number of TMs depends on how many slots your job
needs and the number of slots per task managers (config key:
taskmanager.numberOfTaskSlots). In this case,

# of TM  = Ceil(total slots need / taskmanager.numberOfTaskSlots)

How many your job's topology and parallelism. For streaming SQL, the
whole job graph will locate in one slot by default. So, the number of
slots would be equal to the parallelism you set.

> Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs.

I might not fully understand your problem. Do you mean starting TMs
before submitting the job? If that is the case,
- You can try the standalone k8s mode. [1]
- Warmup the session by submitting some puppet jobs yourselves and
submit your job before those TMs idle timeout.
- In FLINK-15959, we will introduce the min number of slots of the
cluster. With this feature, you can configure how many TMs needed
before submitting the jobs.

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/

Best,
Yangze Guo

On Tue, May 11, 2021 at 12:24 PM Youngwoo Kim (김영우) <[hidden email]> wrote:
>
> Hi,
>
> I have deployed a cluster with session mode on kubernetes and I can see one deployment, services and one JM. I'm trying to run a SQL query through sql client. for instance, 'INSERT INTO ... SELECT ...;'
>
> When I run the query in cli, the Flink session is spinning up a TM for the query and then the query is running in a job.
>
> Now, I'm curious. How does Flink calculate the number of TMs for the query? and also, Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs.
>
> Thanks,
> Youngwoo