Flink jobs organization and maintainability

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink jobs organization and maintainability

Sweta Kalakuntla
Hi,

I am going to have to implement many similar jobs. I need guidance and examples that you may have for organizing them in the Git repository without having to have one repo per job.

Thanks,
SK

--


Reply | Threaded
Open this post in threaded view
|

Re: Flink jobs organization and maintainability

Arvid Heise-4
If you have many similar jobs, they should be in the same repo (especially if they have the same development cycle).

First, how different are the jobs?
A) If they are very similar, go with just one job and configure it differently for each application. Then you can use different deployments of the same jar with different parameters/config. If you have deployment by code, then you will have all deployment files in some special deploy directory on root.
B) If they are somewhat similar, go with one maven/gradle project having several modules. Shared code should go into a common module. You should have a deploy directory per module.

Note that I'd recommend Table API to implement the jobs as you can use the simpler Option A much longer. You can easily it configurable to: a) join from multiple sources, b) group by a varying number of fields, c) have different aggregation functions, d) use different transformation...

On Tue, Feb 23, 2021 at 10:56 PM Sweta Kalakuntla <[hidden email]> wrote:
Hi,

I am going to have to implement many similar jobs. I need guidance and examples that you may have for organizing them in the Git repository without having to have one repo per job.

Thanks,
SK

--


Reply | Threaded
Open this post in threaded view
|

Re: Flink jobs organization and maintainability

yidan zhao
I used a yarm config file to describe my jobs, and using 'start xxxJobName' to start the job which is implemented by shell scripts.

Arvid Heise <[hidden email]> 于2021年2月24日周三 下午10:09写道:
If you have many similar jobs, they should be in the same repo (especially if they have the same development cycle).

First, how different are the jobs?
A) If they are very similar, go with just one job and configure it differently for each application. Then you can use different deployments of the same jar with different parameters/config. If you have deployment by code, then you will have all deployment files in some special deploy directory on root.
B) If they are somewhat similar, go with one maven/gradle project having several modules. Shared code should go into a common module. You should have a deploy directory per module.

Note that I'd recommend Table API to implement the jobs as you can use the simpler Option A much longer. You can easily it configurable to: a) join from multiple sources, b) group by a varying number of fields, c) have different aggregation functions, d) use different transformation...

On Tue, Feb 23, 2021 at 10:56 PM Sweta Kalakuntla <[hidden email]> wrote:
Hi,

I am going to have to implement many similar jobs. I need guidance and examples that you may have for organizing them in the Git repository without having to have one repo per job.

Thanks,
SK

--