Multiple jobs in the same Flink project

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple jobs in the same Flink project

Oğuzhan Mangır
We have a flink project with multiple jobs. That means we can submit multiple job with the same jar. But there is a limitation here i think. Because, let's assume;

I create a flink project with 3 jobs, and create a single jar then put it to the flink cluster (all of these steps are working on a ci/cd pipeline, and the jar name will be assigned automatically. for example my-jar-v1, my-jar-v2 .. ). Then I submit 3 jobs using the same jar.

Later, I changed the job2, then created a new jar with the new version e.g. my-jar-v2,  then re-deploy the job2 again with the new jar. But in this case, when I look at the submit page in the UI, i don't know which job was submitted from the specified jar.

my-jar-v1 => job1, job2, jo3 deployed
my-jar-v2 => job2 (re-deployed) =>> in this case, i know job2 deployed with this jar, but others will not know it because ui does not show this information

And also, if any problem occurs in job2 when i deploy it using the my-jar-2, i can use the previous jar(my-jar-v1). But if there are a lot of jars, it can be very difficult.|

Is there any best practice for that?
Reply | Threaded
Open this post in threaded view
|

Re: Multiple jobs in the same Flink project

Arvid Heise-4
Hi Oğuzhan,

I think you know the answer already: it's easiest to have 1 jar per application. And in most cases, it's easiest to also have 1 repo per application. You can use the same template for all 3 and all future applications without any special cases.

My rule of thumb is the following: if the life-cycles of applications are tightly coupled, they can reside in the same repository. So if update/restart of app1, also means that app2 needs to be updated/restarted, then use the same CI/CD process.

If (like in your case) the life cycles are independent, treat them as separate entities. You can have shared code in a 4. repo or include 1 repo into the other repos.

I would not optimize in the number of repos but in simplicity of a particular repo. Ultimately, I like to have all repos exactly the same using the same gradle plugins or build templates (since I don't enjoy doing DevOp stuff over and over again). If you use GitLab (and I guess similar tools), it's very easy to manage a large number of repos.

On Thu, Apr 22, 2021 at 7:42 PM Oğuzhan Mangır <[hidden email]> wrote:
We have a flink project with multiple jobs. That means we can submit multiple job with the same jar. But there is a limitation here i think. Because, let's assume;

I create a flink project with 3 jobs, and create a single jar then put it to the flink cluster (all of these steps are working on a ci/cd pipeline, and the jar name will be assigned automatically. for example my-jar-v1, my-jar-v2 .. ). Then I submit 3 jobs using the same jar.

Later, I changed the job2, then created a new jar with the new version e.g. my-jar-v2,  then re-deploy the job2 again with the new jar. But in this case, when I look at the submit page in the UI, i don't know which job was submitted from the specified jar.

my-jar-v1 => job1, job2, jo3 deployed
my-jar-v2 => job2 (re-deployed) =>> in this case, i know job2 deployed with this jar, but others will not know it because ui does not show this information

And also, if any problem occurs in job2 when i deploy it using the my-jar-2, i can use the previous jar(my-jar-v1). But if there are a lot of jars, it can be very difficult.|

Is there any best practice for that?