Package multiple jobs in a single jar

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Package multiple jobs in a single jar

Flavio Pompermaier
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Fabian Hueske-2
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Flavio Pompermaier
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Fabian Hueske-2
I didn't say that the main should return the ExecutionEnvironment.
You can define and execute as many programs in a main function as you like.
The program can be defined somewhere else, e.g., in a function that receives an ExecutionEnvironment and attaches a program such as

public void buildMyProgram(ExecutionEnvironment env) {
  DataSet<String> lines = env.readTextFile(...);
  // do something
  lines.writeAsText(...);
}

That method could be invoked from main():

psv main() {
  ExecutionEnv env = ...

  if(...) {
    buildMyProgram(env);
  }
  else {
    buildSomeOtherProg(env);
  }

  env.execute();

  // run some more programs
}

2015-05-08 12:56 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio




Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Flavio Pompermaier
Ok, get it.
And is there a reference pom.xml for shading my application into one fat-jar? which flink dependencies can I exclude?

On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]> wrote:
I didn't say that the main should return the ExecutionEnvironment.
You can define and execute as many programs in a main function as you like.
The program can be defined somewhere else, e.g., in a function that receives an ExecutionEnvironment and attaches a program such as

public void buildMyProgram(ExecutionEnvironment env) {
  DataSet<String> lines = env.readTextFile(...);
  // do something
  lines.writeAsText(...);
}

That method could be invoked from main():

psv main() {
  ExecutionEnv env = ...

  if(...) {
    buildMyProgram(env);
  }
  else {
    buildSomeOtherProg(env);
  }

  env.execute();

  // run some more programs
}

2015-05-08 12:56 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio






Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

rmetzger0

On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier <[hidden email]> wrote:
Ok, get it.
And is there a reference pom.xml for shading my application into one fat-jar? which flink dependencies can I exclude?

On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]> wrote:
I didn't say that the main should return the ExecutionEnvironment.
You can define and execute as many programs in a main function as you like.
The program can be defined somewhere else, e.g., in a function that receives an ExecutionEnvironment and attaches a program such as

public void buildMyProgram(ExecutionEnvironment env) {
  DataSet<String> lines = env.readTextFile(...);
  // do something
  lines.writeAsText(...);
}

That method could be invoked from main():

psv main() {
  ExecutionEnv env = ...

  if(...) {
    buildMyProgram(env);
  }
  else {
    buildSomeOtherProg(env);
  }

  env.execute();

  // run some more programs
}

2015-05-08 12:56 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio







Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Malte Schwarzer
Hi Flavio,

you also can put each job in a single class and use the –c parameter to execute jobs separately:

/bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
/bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar

Cheers
Malte

Von: Robert Metzger <[hidden email]>
Antworten an: <[hidden email]>
Datum: Freitag, 8. Mai 2015 14:57
An: "[hidden email]" <[hidden email]>
Betreff: Re: Package multiple jobs in a single jar


On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier <[hidden email]> wrote:
Ok, get it.
And is there a reference pom.xml for shading my application into one fat-jar? which flink dependencies can I exclude?

On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]> wrote:
I didn't say that the main should return the ExecutionEnvironment.
You can define and execute as many programs in a main function as you like.
The program can be defined somewhere else, e.g., in a function that receives an ExecutionEnvironment and attaches a program such as

public void buildMyProgram(ExecutionEnvironment env) {
  DataSet<String> lines = env.readTextFile(...);
  // do something
  lines.writeAsText(...);
}

That method could be invoked from main():

psv main() {
  ExecutionEnv env = ...

  if(...) {
    buildMyProgram(env);
  }
  else {
    buildSomeOtherProg(env);
  }

  env.execute();

  // run some more programs
}

2015-05-08 12:56 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio







Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Flavio Pompermaier
Thank you all for the support!
It will be a really nice feature if the web client could be able to show me the list of Flink jobs within my jar..
it should be sufficient to mark them with a special annotation and inspect the classes within the jar..

On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <[hidden email]> wrote:
Hi Flavio,

you also can put each job in a single class and use the –c parameter to execute jobs separately:

/bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
/bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar

Cheers
Malte

Von: Robert Metzger <[hidden email]>
Antworten an: <[hidden email]>
Datum: Freitag, 8. Mai 2015 14:57
An: "[hidden email]" <[hidden email]>
Betreff: Re: Package multiple jobs in a single jar


On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier <[hidden email]> wrote:
Ok, get it.
And is there a reference pom.xml for shading my application into one fat-jar? which flink dependencies can I exclude?

On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]> wrote:
I didn't say that the main should return the ExecutionEnvironment.
You can define and execute as many programs in a main function as you like.
The program can be defined somewhere else, e.g., in a function that receives an ExecutionEnvironment and attaches a program such as

public void buildMyProgram(ExecutionEnvironment env) {
  DataSet<String> lines = env.readTextFile(...);
  // do something
  lines.writeAsText(...);
}

That method could be invoked from main():

psv main() {
  ExecutionEnv env = ...

  if(...) {
    buildMyProgram(env);
  }
  else {
    buildSomeOtherProg(env);
  }

  env.execute();

  // run some more programs
}

2015-05-08 12:56 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
thanks for the response.
So my mains should be converted in a method returning the ExecutionEnvironment.
However it think that it will be very nice to have a syntax like the one of the Hadoop ProgramDriver to define jobs to invoke from a single root class.
Do you think it could be useful?

On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske <[hidden email]> wrote:
You easily have multiple Flink programs in a single JAR file.
A program is defined using an ExecutionEnvironment and executed when you call ExecutionEnvironment.exeucte().
Where and how you do that does not matter.

You can for example implement a main function such as:

public static void main(String... args) {

  if (today == Monday) {
    ExecutionEnvironment env = ...
    // define Monday prog
    env.execute()
  }
  else {
    ExecutionEnvironment env = ...
    // define other prog
    env.execute()
  }
}

2015-05-08 11:41 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
is there any way to keep multiple jobs in a jar and then choose at runtime the one to execute (like what ProgramDriver does in Hadoop)?

Best,
Flavio









Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Matthias J. Sax
Hi,

I like the idea that Flink's WebClient can show different plans for
different jobs within a single jar file.

I prepared a prototype for this feature. You can find it here:
https://github.com/mjsax/flink/tree/multipleJobsWebUI

To test the feature, you need to prepare a jar file, that contains the
code of multiple programs and specify each entry class in the manifest
file as comma separated values in "program-class" line.

Feedback is welcome. :)


-Matthias


On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:

> Thank you all for the support!
> It will be a really nice feature if the web client could be able to show
> me the list of Flink jobs within my jar..
> it should be sufficient to mark them with a special annotation and
> inspect the classes within the jar..
>
> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Flavio,
>
>     you also can put each job in a single class and use the –c parameter
>     to execute jobs separately:
>
>     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
>     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
>     …
>
>     Cheers
>     Malte
>
>     Von: Robert Metzger <[hidden email] <mailto:[hidden email]>>
>     Antworten an: <[hidden email] <mailto:[hidden email]>>
>     Datum: Freitag, 8. Mai 2015 14:57
>     An: "[hidden email] <mailto:[hidden email]>"
>     <[hidden email] <mailto:[hidden email]>>
>     Betreff: Re: Package multiple jobs in a single jar
>
>     Hi Flavio,
>
>     the pom from our quickstart is a good
>     reference: https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>
>
>
>
>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Ok, get it.
>         And is there a reference pom.xml for shading my application into
>         one fat-jar? which flink dependencies can I exclude?
>
>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]
>         <mailto:[hidden email]>> wrote:
>
>             I didn't say that the main should return the
>             ExecutionEnvironment.
>             You can define and execute as many programs in a main
>             function as you like.
>             The program can be defined somewhere else, e.g., in a
>             function that receives an ExecutionEnvironment and attaches
>             a program such as
>
>             public void buildMyProgram(ExecutionEnvironment env) {
>               DataSet<String> lines = env.readTextFile(...);
>               // do something
>               lines.writeAsText(...);
>             }
>
>             That method could be invoked from main():
>
>             psv main() {
>               ExecutionEnv env = ...
>
>               if(...) {
>                 buildMyProgram(env);
>               }
>               else {
>                 buildSomeOtherProg(env);
>               }
>
>               env.execute();
>
>               // run some more programs
>             }
>
>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>             <[hidden email] <mailto:[hidden email]>>:
>
>                 Hi Fabian,
>                 thanks for the response.
>                 So my mains should be converted in a method returning
>                 the ExecutionEnvironment.
>                 However it think that it will be very nice to have a
>                 syntax like the one of the Hadoop ProgramDriver to
>                 define jobs to invoke from a single root class.
>                 Do you think it could be useful?
>
>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>                 <[hidden email] <mailto:[hidden email]>> wrote:
>
>                     You easily have multiple Flink programs in a single
>                     JAR file.
>                     A program is defined using an ExecutionEnvironment
>                     and executed when you call
>                     ExecutionEnvironment.exeucte().
>                     Where and how you do that does not matter.
>
>                     You can for example implement a main function such as:
>
>                     public static void main(String... args) {
>
>                       if (today == Monday) {
>                         ExecutionEnvironment env = ...
>                         // define Monday prog
>                         env.execute()
>                       }
>                       else {
>                         ExecutionEnvironment env = ...
>                         // define other prog
>                         env.execute()
>                       }
>                     }
>
>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>                     <[hidden email] <mailto:[hidden email]>>:
>
>                         Hi to all,
>                         is there any way to keep multiple jobs in a jar
>                         and then choose at runtime the one to execute
>                         (like what ProgramDriver does in Hadoop)?
>
>                         Best,
>                         Flavio
>
>
>
>
>
>
>
>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Package multiple jobs in a single jar

Flavio Pompermaier
Nice feature Matthias!
My suggestion is to create a specific Flink interface to get also description of a job and standardize parameter passing.
Then, somewhere (e.g. Manifest) you could specify the list of packages (or also directly the classes) to inspect with reflection to extract the list of available Flink jobs.
Something like:

public interface FlinkJob {

/** The name to display in the job submission UI or shell */
//e.g. "My Flink HelloWorld"
String getDisplayName();
//e.g. "This program does this and that etc.."
String getDescription();
//e.g. <0,Integer,"An integer representing my first param">, <1,String,"An string representing my second param">
List<Tuple3<Integer, TypeInfo, String>> paramDescription;
/** Set up the flink job in the passed ExecutionEnvironment */
ExecutionEnvironment config(ExecutionEnvironment env);
}

What do you think?



On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <[hidden email]> wrote:
Hi,

I like the idea that Flink's WebClient can show different plans for
different jobs within a single jar file.

I prepared a prototype for this feature. You can find it here:
https://github.com/mjsax/flink/tree/multipleJobsWebUI

To test the feature, you need to prepare a jar file, that contains the
code of multiple programs and specify each entry class in the manifest
file as comma separated values in "program-class" line.

Feedback is welcome. :)


-Matthias


On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> Thank you all for the support!
> It will be a really nice feature if the web client could be able to show
> me the list of Flink jobs within my jar..
> it should be sufficient to mark them with a special annotation and
> inspect the classes within the jar..
>
> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Flavio,
>
>     you also can put each job in a single class and use the –c parameter
>     to execute jobs separately:
>
>     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
>     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
>     …
>
>     Cheers
>     Malte
>
>     Von: Robert Metzger <[hidden email] <mailto:[hidden email]>>
>     Antworten an: <[hidden email] <mailto:[hidden email]>>
>     Datum: Freitag, 8. Mai 2015 14:57
>     An: "[hidden email] <mailto:[hidden email]>"
>     <[hidden email] <mailto:[hidden email]>>
>     Betreff: Re: Package multiple jobs in a single jar
>
>     Hi Flavio,
>
>     the pom from our quickstart is a good
>     reference: https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>
>
>
>
>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Ok, get it.
>         And is there a reference pom.xml for shading my application into
>         one fat-jar? which flink dependencies can I exclude?
>
>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <[hidden email]
>         <mailto:[hidden email]>> wrote:
>
>             I didn't say that the main should return the
>             ExecutionEnvironment.
>             You can define and execute as many programs in a main
>             function as you like.
>             The program can be defined somewhere else, e.g., in a
>             function that receives an ExecutionEnvironment and attaches
>             a program such as
>
>             public void buildMyProgram(ExecutionEnvironment env) {
>               DataSet<String> lines = env.readTextFile(...);
>               // do something
>               lines.writeAsText(...);
>             }
>
>             That method could be invoked from main():
>
>             psv main() {
>               ExecutionEnv env = ...
>
>               if(...) {
>                 buildMyProgram(env);
>               }
>               else {
>                 buildSomeOtherProg(env);
>               }
>
>               env.execute();
>
>               // run some more programs
>             }
>
>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>             <[hidden email] <mailto:[hidden email]>>:
>
>                 Hi Fabian,
>                 thanks for the response.
>                 So my mains should be converted in a method returning
>                 the ExecutionEnvironment.
>                 However it think that it will be very nice to have a
>                 syntax like the one of the Hadoop ProgramDriver to
>                 define jobs to invoke from a single root class.
>                 Do you think it could be useful?
>
>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>                 <[hidden email] <mailto:[hidden email]>> wrote:
>
>                     You easily have multiple Flink programs in a single
>                     JAR file.
>                     A program is defined using an ExecutionEnvironment
>                     and executed when you call
>                     ExecutionEnvironment.exeucte().
>                     Where and how you do that does not matter.
>
>                     You can for example implement a main function such as:
>
>                     public static void main(String... args) {
>
>                       if (today == Monday) {
>                         ExecutionEnvironment env = ...
>                         // define Monday prog
>                         env.execute()
>                       }
>                       else {
>                         ExecutionEnvironment env = ...
>                         // define other prog
>                         env.execute()
>                       }
>                     }
>
>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>                     <[hidden email] <mailto:[hidden email]>>:
>
>                         Hi to all,
>                         is there any way to keep multiple jobs in a jar
>                         and then choose at runtime the one to execute
>                         (like what ProgramDriver does in Hadoop)?
>
>                         Best,
>                         Flavio
>
>
>
>
>
>
>
>
>