Hi,
I have deployed a cluster with session mode on kubernetes and I can see one deployment, services and one JM. I'm trying to run a SQL query through sql client. for instance, 'INSERT INTO ... SELECT ...;' When I run the query in cli, the Flink session is spinning up a TM for the query and then the query is running in a job. Now, I'm curious. How does Flink calculate the number of TMs for the query? and also, Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs. Thanks, Youngwoo |
Hi, Youngwoo
In K8S session, the number of TMs depends on how many slots your job needs and the number of slots per task managers (config key: taskmanager.numberOfTaskSlots). In this case, # of TM = Ceil(total slots need / taskmanager.numberOfTaskSlots) How many your job's topology and parallelism. For streaming SQL, the whole job graph will locate in one slot by default. So, the number of slots would be equal to the parallelism you set. > Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs. I might not fully understand your problem. Do you mean starting TMs before submitting the job? If that is the case, - You can try the standalone k8s mode. [1] - Warmup the session by submitting some puppet jobs yourselves and submit your job before those TMs idle timeout. - In FLINK-15959, we will introduce the min number of slots of the cluster. With this feature, you can configure how many TMs needed before submitting the jobs. [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/ Best, Yangze Guo On Tue, May 11, 2021 at 12:24 PM Youngwoo Kim (김영우) <[hidden email]> wrote: > > Hi, > > I have deployed a cluster with session mode on kubernetes and I can see one deployment, services and one JM. I'm trying to run a SQL query through sql client. for instance, 'INSERT INTO ... SELECT ...;' > > When I run the query in cli, the Flink session is spinning up a TM for the query and then the query is running in a job. > > Now, I'm curious. How does Flink calculate the number of TMs for the query? and also, Is it possible to run pre-spawned TMs for session mode? I'm looking for a way to scale the computing resources. i.e., # of TM for the jobs. > > Thanks, > Youngwoo |
Hi Yangze, Thank you. That's what I'm looking for. And FLINK-15959 is similar to my given requirements. I hope to run queries 'interactively' but if there are insufficient slots, time for spinning up new TMs is needed. Thanks, Youngwoo On Tue, May 11, 2021 at 3:36 PM Yangze Guo <[hidden email]> wrote: Hi, Youngwoo |
Free forum by Nabble | Edit this page |