Proper way of adding external jars

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Proper way of adding external jars

Gyula Fóra
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Proper way of adding external jars

Scott Kidder
Hi Gyula,

I've typically added external library dependencies to my own application JAR as shaded-dependencies. This ensures that all dependencies are included with my application while being distributed to Flink Job Manager & Task Manager instances.

Another approach is to place these external JARs in the 'lib' sub-directory of your Flink installation. Keep in mind that the external JARs must be installed on every Flink node where your application is expected to run. This works well for dependencies that are large in size or used by multiple Flink applications in your cluster (avoid duplication of dependencies).

Best,

--Scott Kidder

On Mon, Nov 14, 2016 at 7:59 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula

Reply | Threaded
Open this post in threaded view
|

Re: Proper way of adding external jars

Gyula Fóra
Hi Scott, 

Thanks, I am familiar with the ways you suggested. Unfortunately packaging everything together is not really an option in our case, we specifically want to avoid having to do this as many people will set up their own builds and they will inevitable fail to include everything necessary with the correct setup.

Including it in the lib folder would mean I have to copy/remove jars all the time when deploying new jobs if I don't want to have dependency clashes and also it doesn't seem to be a clean solution for something so simple.

I thought the -C option would actually do what I am looking for in an elegant way that's why I am trying to understand what happened.

Cheers,
Gyula

Scott Kidder <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 18:49):
Hi Gyula,

I've typically added external library dependencies to my own application JAR as shaded-dependencies. This ensures that all dependencies are included with my application while being distributed to Flink Job Manager & Task Manager instances.

Another approach is to place these external JARs in the 'lib' sub-directory of your Flink installation. Keep in mind that the external JARs must be installed on every Flink node where your application is expected to run. This works well for dependencies that are large in size or used by multiple Flink applications in your cluster (avoid duplication of dependencies).

Best,

--Scott Kidder

On Mon, Nov 14, 2016 at 7:59 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula

Reply | Threaded
Open this post in threaded view
|

Re: Proper way of adding external jars

Till Rohrmann

Hi Gyula,

did I understand it correct that JarWithMain depends on UserJar because the former will execute classes from the latter and UserJar depends on JarWithMain because it contains classes depending on class from JarWithMain? This sounds like a cyclic dependency to me.

The command line help for -C says the following: “Adds a URL to each user code classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be accessible on all nodes (e.g. by means of a NFS share). You can use this option multiple times for specifying more than one URL. The protocol must be supported by the {@link java.net.URLClassLoader}.”

Thus, I assume that you’ve placed the jar whose classpath you specify via -C somewhere where it is accessible from all nodes including the one running the CliFrontend, right? Otherwise running such a job cannot work because -C won’t distribute the jars in the cluster.

I tried to reproduce your issue by setting up a multi module where my main module depends on the user module via dynamic class loading and the user module depends on the main module by a class dependency. Specifying a classpath which is accessible from all nodes allows to run the job via flink/run -C main -c Job util and flink/run -C util -c Job main where Job is residing in main.

Thus, I would need some more information about the actual setup to be able to reproduce your problem.

Cheers,
Till


On Tue, Nov 15, 2016 at 10:06 AM, Gyula Fóra <[hidden email]> wrote:
Hi Scott, 

Thanks, I am familiar with the ways you suggested. Unfortunately packaging everything together is not really an option in our case, we specifically want to avoid having to do this as many people will set up their own builds and they will inevitable fail to include everything necessary with the correct setup.

Including it in the lib folder would mean I have to copy/remove jars all the time when deploying new jobs if I don't want to have dependency clashes and also it doesn't seem to be a clean solution for something so simple.

I thought the -C option would actually do what I am looking for in an elegant way that's why I am trying to understand what happened.

Cheers,
Gyula

Scott Kidder <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 18:49):
Hi Gyula,

I've typically added external library dependencies to my own application JAR as shaded-dependencies. This ensures that all dependencies are included with my application while being distributed to Flink Job Manager & Task Manager instances.

Another approach is to place these external JARs in the 'lib' sub-directory of your Flink installation. Keep in mind that the external JARs must be installed on every Flink node where your application is expected to run. This works well for dependencies that are large in size or used by multiple Flink applications in your cluster (avoid duplication of dependencies).

Best,

--Scott Kidder

On Mon, Nov 14, 2016 at 7:59 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula


Reply | Threaded
Open this post in threaded view
|

Re: Proper way of adding external jars

Gyula Fóra
Hi Till,

Sorry, I understand that this was confusing. 
The JarWithMain contains a class called Launcher with main method that does something like:

Class.getForName("classFromUserJar").newInstance().launch()

So it doesnt have any "static" dependency to the UserJar. JarWithMain has a lot of API classes that UserJar depends on.
And you are right, I put the jars in a place where it is accessible from every node. (We have a distributed fs mounted under the same path on each machine)

I hope I'm clearer now, but let me know if you need some more info (or you can drop me a line on skype)

Gyula



Till Rohrmann <[hidden email]> ezt írta (időpont: 2016. nov. 15., K, 14:55):

Hi Gyula,

did I understand it correct that JarWithMain depends on UserJar because the former will execute classes from the latter and UserJar depends on JarWithMain because it contains classes depending on class from JarWithMain? This sounds like a cyclic dependency to me.

The command line help for -C says the following: “Adds a URL to each user code classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be accessible on all nodes (e.g. by means of a NFS share). You can use this option multiple times for specifying more than one URL. The protocol must be supported by the {@link java.net.URLClassLoader}.”

Thus, I assume that you’ve placed the jar whose classpath you specify via -C somewhere where it is accessible from all nodes including the one running the CliFrontend, right? Otherwise running such a job cannot work because -C won’t distribute the jars in the cluster.

I tried to reproduce your issue by setting up a multi module where my main module depends on the user module via dynamic class loading and the user module depends on the main module by a class dependency. Specifying a classpath which is accessible from all nodes allows to run the job via flink/run -C main -c Job util and flink/run -C util -c Job main where Job is residing in main.

Thus, I would need some more information about the actual setup to be able to reproduce your problem.

Cheers,
Till


On Tue, Nov 15, 2016 at 10:06 AM, Gyula Fóra <[hidden email]> wrote:
Hi Scott, 

Thanks, I am familiar with the ways you suggested. Unfortunately packaging everything together is not really an option in our case, we specifically want to avoid having to do this as many people will set up their own builds and they will inevitable fail to include everything necessary with the correct setup.

Including it in the lib folder would mean I have to copy/remove jars all the time when deploying new jobs if I don't want to have dependency clashes and also it doesn't seem to be a clean solution for something so simple.

I thought the -C option would actually do what I am looking for in an elegant way that's why I am trying to understand what happened.

Cheers,
Gyula

Scott Kidder <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 18:49):
Hi Gyula,

I've typically added external library dependencies to my own application JAR as shaded-dependencies. This ensures that all dependencies are included with my application while being distributed to Flink Job Manager & Task Manager instances.

Another approach is to place these external JARs in the 'lib' sub-directory of your Flink installation. Keep in mind that the external JARs must be installed on every Flink node where your application is expected to run. This works well for dependencies that are large in size or used by multiple Flink applications in your cluster (avoid duplication of dependencies).

Best,

--Scott Kidder

On Mon, Nov 14, 2016 at 7:59 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula


Reply | Threaded
Open this post in threaded view
|

Re: Proper way of adding external jars

Till Rohrmann
Which version of Flink are you using because I tested exactly this scenario with the latest 1.2-SNAPSHOT on a local flink cluster and it worked both ways (independent of which jar was specified by -C or provided as the user code jar).

Can you maybe share the stack trace of the error. Then I could try to check whether the right class loader is used for loading the classes (which should be the case).

Cheers,
Till

On Tue, Nov 15, 2016 at 3:37 PM, Gyula Fóra <[hidden email]> wrote:
Hi Till,

Sorry, I understand that this was confusing. 
The JarWithMain contains a class called Launcher with main method that does something like:

Class.getForName("classFromUserJar").newInstance().launch()

So it doesnt have any "static" dependency to the UserJar. JarWithMain has a lot of API classes that UserJar depends on.
And you are right, I put the jars in a place where it is accessible from every node. (We have a distributed fs mounted under the same path on each machine)

I hope I'm clearer now, but let me know if you need some more info (or you can drop me a line on skype)

Gyula



Till Rohrmann <[hidden email]> ezt írta (időpont: 2016. nov. 15., K, 14:55):

Hi Gyula,

did I understand it correct that JarWithMain depends on UserJar because the former will execute classes from the latter and UserJar depends on JarWithMain because it contains classes depending on class from JarWithMain? This sounds like a cyclic dependency to me.

The command line help for -C says the following: “Adds a URL to each user code classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be accessible on all nodes (e.g. by means of a NFS share). You can use this option multiple times for specifying more than one URL. The protocol must be supported by the {@link java.net.URLClassLoader}.”

Thus, I assume that you’ve placed the jar whose classpath you specify via -C somewhere where it is accessible from all nodes including the one running the CliFrontend, right? Otherwise running such a job cannot work because -C won’t distribute the jars in the cluster.

I tried to reproduce your issue by setting up a multi module where my main module depends on the user module via dynamic class loading and the user module depends on the main module by a class dependency. Specifying a classpath which is accessible from all nodes allows to run the job via flink/run -C main -c Job util and flink/run -C util -c Job main where Job is residing in main.

Thus, I would need some more information about the actual setup to be able to reproduce your problem.

Cheers,
Till


On Tue, Nov 15, 2016 at 10:06 AM, Gyula Fóra <[hidden email]> wrote:
Hi Scott, 

Thanks, I am familiar with the ways you suggested. Unfortunately packaging everything together is not really an option in our case, we specifically want to avoid having to do this as many people will set up their own builds and they will inevitable fail to include everything necessary with the correct setup.

Including it in the lib folder would mean I have to copy/remove jars all the time when deploying new jobs if I don't want to have dependency clashes and also it doesn't seem to be a clean solution for something so simple.

I thought the -C option would actually do what I am looking for in an elegant way that's why I am trying to understand what happened.

Cheers,
Gyula

Scott Kidder <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 18:49):
Hi Gyula,

I've typically added external library dependencies to my own application JAR as shaded-dependencies. This ensures that all dependencies are included with my application while being distributed to Flink Job Manager & Task Manager instances.

Another approach is to place these external JARs in the 'lib' sub-directory of your Flink installation. Keep in mind that the external JARs must be installed on every Flink node where your application is expected to run. This works well for dependencies that are large in size or used by multiple Flink applications in your cluster (avoid duplication of dependencies).

Best,

--Scott Kidder

On Mon, Nov 14, 2016 at 7:59 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

I have been trying to use the -C flag to add external jars with user code and I have observed some strange behaviour.

What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar contains some classes that the main method will eventually execute and also depends on classes from JarWithMain.

Running this works:
flink run .... -C UserJar.jar -c MainMethod JarWithMain.jar args...

Running this leads to no class def found errors in the StreamTask initialization where it reads the functions from the config:
flink run .... -C JarWithMain.jar -c MainMethod UserJar.jar  args...

Did I miss something?

Cheers,
Gyula