Hello all!
I have a setup composed of several streaming pipelines. These have different deployment lifecycles: I want to be able to modify and redeploy the topology of one while the other is still up. I am thus putting them in different jobs. The problem is I have a Co-Location constraint between one subtask of each pipeline; I'd like them to run on the same TaskSlots, much like if they were sharing a TaskSlot; or at least have them on the same JVM. A semi-official feature "DataStream.getTransformation().setCoLocationGroupKey(stringKey)" [1] exists for this, but seem to be tied to the Sub-Tasks actually being able to be co-located on the same Task Slot. The documentation mentions [2] that it might be impossible to do ("Flink allows subtasks to share slots even if they are subtasks of different tasks, so long as they are from the same job"). The streaming pipelines are numerous (about 10), and I can't afford to increase the number of TaskSlots per TaskManager. I also would like to avoid putting all the pipelines in the same job, restarting it every time a single one changes. I'd like to have mailing list's informed opinion about this, if there are workarounds, or if I could reconsider my problem under another angle. Cheers Ben |
Hi Ben, You can not share slots across jobs. Flink adopts a two-level slot scheduling mechanism. Slots are firstly allocated to each job, then the JobMaster decides which tasks should be executed in which slots, i.e. slot sharing. I think what you are looking for is Pipelined Region Restart Strategy [1], which restarts only the tasks connected by pipelined edges instead of the whole job graph. On Mon, Feb 24, 2020 at 11:28 PM Benoît Paris <[hidden email]> wrote:
|
Hi Xintong Thank you for your answer. This seems promising, I'll look into it. Do you believe the code of the operators of the restarted Region can be changed between restarts? Best Benoît On Tue, Feb 25, 2020 at 2:30 AM Xintong Song <[hidden email]> wrote:
|
Do you believe the code of the operators of the restarted Region can be changed between restarts? I'm not an expert on the restart strategies, but AFAIK the answer is probably not. Sorry I overlooked that you need to modify the job. Thank you~ Xintong Song On Tue, Feb 25, 2020 at 6:00 PM Benoît Paris <[hidden email]> wrote:
|
Hi Ben, I think at the moment, it is not possible because of current scheduling design which Xintong has already mentioned. The jobs are completely isolated and there is no synchronisation between their deployment. Alignment of tasks by e.g. key groups in general is difficult as it is up to the scheduler to decide where to deploy each job subtask. The restart strategy is only for the failure scenario. Any jobs changes require full job restart at the moment. I pull in Gary and Zhu to add more details if I miss something here. Best, Andrey On Tue, Feb 25, 2020 at 1:38 PM Xintong Song <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |