Number of task slots in the cluster and dynamic number of jobs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Number of task slots in the cluster and dynamic number of jobs

Adebski
I've been reading Flink docs recently and I am confused about task slots w.r.t. number of jobs in the cluster.

As far as I can understand by default each task slot can hold multiple subtasks from single job: "By default, Flink allows subtasks to share slots even if they are subtasks of different tasks, so long as they are from the same job." so if in my cluster number of task slots is equal to the max parallelism of a single job everything will work fine. But what in a situation where I cannot determine upfront the number of jobs that will potentially be executed in the Flink cluster?

From my (very simple) experiments it seems that if in local instance of Flink cluster I have single task slot I am able to only execute single job at the time. In my potential use case I will have many dynamically created jobs (their definitions will be supplied by external system) that will digest data from other external systems, but most of them won't be computationally expensive.

What would be idiomatic way of handling something like that. The only two solutions that come to my mind are:

1. create cluster with very high task slot number and hope it is sufficient (seems kind of hacky)
2. create single master job that will somehow route events based on some key to different parts of the graph (that one also seems hacky to me), also if I understand correctly each time someone would add any job I would have to stop the current one and then reupload updated job definition to the cluster