Avoiding Dynamic Classloading

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding Dynamic Classloading

Chan, Regina

Hi,

 

I was reading that I should avoid using dynamic classloading and so copy the job’s jar into the /lib directory (RE: below)

 

1.     How can I confirm that the jar was copied over? I only see the following below:

 

2017-11-20 15:36:52,724 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/local/data/scratch/tmp/p2epdlsu/chregi/flink-1.2.0/lib to hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/lib

2017-11-20 15:37:04,644 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/local/data/scratch/tmp/p2epdlsu/chregi/flink-1.2.0/lib/flink-dist_2.10-1.2.0.jar to hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-dist_2.10-1.2.0.jar

2017-11-20 15:37:06,634 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/p2epdlsu/datalake-cdc-prod/etc/flink/conf/flink-conf.yaml to hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-conf.yaml

 

2.     I also saw this ticket https://issues.apache.org/jira/browse/FLINK-4913 and was wondering whether this is orthogonal to the dynamic loading and having to put my jar in the lib directory. Or should this be handled by default already.

3.     I start flink from a globally mounted/shared copy that I don’t have write access to. I can’t easily put jars in that lib folder. For the same reason I shouldn’t modify the global copy of the bin/config.sh. Is there a way to configure where flink picks up the lib folder from?

 

Thanks!

 

 

 

Avoiding Dynamic Classloading

All components (JobManger, TaskManager, Client, ApplicationMaster, …) log their classpath setting on startup. They can be found as part of the environment information at the beginning of the log.

When running a setup where the Flink JobManager and TaskManagers are exclusive to one particular job, one can put JAR files directly into the /lib folder to make sure they are part of the classpath and not loaded dynamically.

It usually works to put the job’s JAR file into the /lib directory. The JAR will be part of both the classpath (the AppClassLoader) and the dynamic class loader (FlinkUserCodeClassLoader). Because the AppClassLoader is the parent of the FlinkUserCodeClassLoader (and Java loads parent-first), this should result in classes being loaded only once.

For setups where the job’s JAR file cannot be put to the /lib folder (for example because the setup is a session that is used by multiple jobs), it may still be possible to put common libraries to the /lib folder, and avoid dynamic class loading for those.

 

 

Regina Chan

Goldman Sachs Enterprise Platforms, Data Architecture

30 Hudson Street, 37th floor | Jersey City, NY 07302 (  (212) 902-5697

 

Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Dynamic Classloading

Aljoscha Krettek
Hi,

Yes, if I remember correctly, this was changed in 1.2 to always include the user-jar in the system classloader on YARN. With Flink 1.4 we are changing the user-code classloader to load classes from the user-jar first (child-first classloading) by default so a lot of the comments on avoiding dynamic classloading are not relevant anymore then.

Side note, you should only worry about those issues if you're seeing problems that hint at that, such as ClassNotFound exceptions or exceptions about missing methods.

Best,
Aljoscha

On 20. Nov 2017, at 22:04, Chan, Regina <[hidden email]> wrote:

Hi,
 
I was reading that I should avoid using dynamic classloading and so copy the job’s jar into the /lib directory (RE: below)
 
1.     How can I confirm that the jar was copied over? I only see the following below:
 
2017-11-20 15:36:52,724 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/local/data/scratch/tmp/p2epdlsu/chregi/flink-1.2.0/lib to <a href="hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/lib" style="color: purple; text-decoration: underline;" class="">hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/lib
2017-11-20 15:37:04,644 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/local/data/scratch/tmp/p2epdlsu/chregi/flink-1.2.0/lib/flink-dist_2.10-1.2.0.jar to <a href="hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-dist_2.10-1.2.0.jar" style="color: purple; text-decoration: underline;" class="">hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-dist_2.10-1.2.0.jar
2017-11-20 15:37:06,634 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/p2epdlsu/datalake-cdc-prod/etc/flink/conf/flink-conf.yaml to<a href="hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-conf.yaml" style="color: purple; text-decoration: underline;" class="">hdfs://d173636/user/delp_prod/.flink/application_1511197407590_58493/flink-conf.yaml
 
2.     I also saw this ticket https://issues.apache.org/jira/browse/FLINK-4913 and was wondering whether this is orthogonal to the dynamic loading and having to put my jar in the lib directory. Or should this be handled by default already.

3.     I start flink from a globally mounted/shared copy that I don’t have write access to. I can’t easily put jars in that lib folder. For the same reason I shouldn’t modify the global copy of the bin/config.sh. Is there a way to configure where flink picks up the lib folder from?
 
Thanks!

 

 
 

Avoiding Dynamic Classloading

All components (JobManger, TaskManager, Client, ApplicationMaster, …) log their classpath setting on startup. They can be found as part of the environment information at the beginning of the log.

When running a setup where the Flink JobManager and TaskManagers are exclusive to one particular job, one can put JAR files directly into the /lib folder to make sure they are part of the classpath and not loaded dynamically.

It usually works to put the job’s JAR file into the /lib directory. The JAR will be part of both the classpath (the AppClassLoader) and the dynamic class loader (FlinkUserCodeClassLoader). Because the AppClassLoader is the parent of the FlinkUserCodeClassLoader (and Java loads parent-first), this should result in classes being loaded only once.

For setups where the job’s JAR file cannot be put to the /lib folder (for example because the setup is a session that is used by multiple jobs), it may still be possible to put common libraries to the /lib folder, and avoid dynamic class loading for those.

 
 
Regina Chan
Goldman Sachs  Enterprise Platforms, Data Architecture
30 Hudson Street, 37th floor | Jersey City, NY 07302 (  (212) 902-5697