The jobs are somehow related to each other in the sense that we have a configurable pipeline where there are optional steps you can enable/disable (and thus we create a single big jar).Because of this, we have our application REST service that actually works also as a job scheduler and use the job server as a proxy towards Flink: when one steps ends (this is what is signalled back after the env.execute() from Flink to the application REST service) our application tells the job server to execute the next job of the pipeline on the cluster.Of course this is a "dirty" solution (because we should user a workflow scheduler like Airflow or Luigi or similar) but we wanted to keep things as simplest as possible for the moment.In the future, if our customers would ever improve this part, we will integrate our application with a dedicated job scheduler like the one listed before (probably)..I don't know if some of them are nowadays already integrated with Flink..when we started coding our frontend application (2 ears ago) none of them were using it.Best,FlavioOn Tue, Jul 23, 2019 at 10:40 AM Jeff Zhang <[hidden email]> wrote:Thanks Flavio,I get most of your points except one
- Get the list of jobs contained in jar (ideally this is is true for every engine beyond Spark or Flink)
Just curious to know how you submit job via rest api, if there're multiple jobs in one jar, then do you need to submit jar one time and submit jobs multiple times ?And is there any relationship between these jobs in the same jar ?Flavio Pompermaier <[hidden email]> 于2019年7月23日周二 下午4:01写道:Hi Jeff, the thing about the manifest is really about to have a way to list multiple main classes in the jart (without the need to inspect every Java class or forcing a 1-to-1 between jar and job like it is now).My requirements were driven by the UI we're using in our framework:
- Get the list of jobs contained in jar (ideally this is is true for every engine beyond Spark or Flink)
- Get the list of required/optional parameters for each job
- Besides the optionality of a parameter, each parameter should include an help description, a type (to validate the input param), a default value and a set of choices (when there's a limited number of options available)
- obviously the job serve should be able to submit/run/cancel/monitor a job and upload/delete the uploaded jars
- the job server should not depend on any target platform dependency (Spark or Flink) beyond the rest client: at the moment the rest client requires a lot of core libs (indeed because it needs to submit the job graph/plan)
- in our vision, the flink client should be something like Apache Livy (https://livy.apache.org/)
- One of the biggest limitations we face when running a Flink job from the REST API is the fact that the job can't do anything after env.execute() while we need to call an external service to signal that the job has ended + some other details
Best,FlavioOn Tue, Jul 23, 2019 at 3:44 AM Jeff Zhang <[hidden email]> wrote:Hi Flavio,Based on the discussion in the tickets you mentioned above, the program-class attribute was a mistake and community is intended to use main-class to replace it.Deprecating Program interface is a part of work of flink new client api.IIUC, your requirements are not so complicated. We can implement that in the new flink client api. How about listing your requirement, and let's discuss how we can make it in the new flink client api. BTW, I guess most of your requirements are based on your flink job server, It would be helpful if you could provide more info about your flink job server. ThanksFlavio Pompermaier <[hidden email]> 于2019年7月22日周一 下午8:59写道:Hi Tison,we use a modified version of the Program interface to enable a web UI do properly detect and run Flink jobs contained in a jar + their parameters.As stated in [1], we dected multiple Main classes per jar by handling an extra comma-separeted Manifest entry (i.e. 'Main-classes').As mentioned on the discussion on the dev ML, our revised Program interface looks like this:public interface FlinkJob {
String getDescription();
List<FlinkJobParameter> getParameters();boolean isStreamingOrBatch();}public class FlinkJobParameter {private String paramName;
private String paramType = "string";
private String paramDesc;
private String paramDefaultValue;
private Set<String> choices;
private boolean mandatory;}I've also opened some JIRA issues related to this topic:Best,FlavioOn Mon, Jul 22, 2019 at 1:46 PM Zili Chen <[hidden email]> wrote:Hi guys,We want to have an accurate idea of how many people are implementingFlink job based on the interface Program, and how they actuallyimplement it.The reason I ask for the survey is from this thread[1] where we noticethis codepath is stale and less useful than it should be. As it is aninterface marked as @PublicEvolving it is originally aimed at servingas user interface. Thus before doing deprecation or dropping, we'd liketo see if there are users implementing their job based on thisinterface(org.apache.flink.api.common.Program) and if there is any,we are curious about how it is used.If little or none of Flink user based on this interface, we wouldpropose deprecating or dropping it.I really appreciate your time and your insight.--Best Regards
Jeff Zhang--Best Regards
Jeff Zhang
Free forum by Nabble | Edit this page |