Classloading issues after changing to 1.4

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Classloading issues after changing to 1.4

eSKa
This post was updated on .
Hello,
I still have problem after upgrading from flink 1.3.1 to 1.4.2
Our scenario looks like that:
we have container running on top of yarn. Machine that starts it has
installed flink and also loading some classpath libraries (e.g. hadoop) into
container.
there is seperate rest service that gets requests to run export job - it
uses YarnClusterClient and submitting Packaged program. Jars with actual
flink jobs are located in lib/ directory of service. On the machine where
Spring service is deployed we don't have flink installed.
For version 1.3 we had some libraries also loaded to container so that they
wont have to be loaded dynamically every time. If I understand it correctly
in strategy child-first it should not be needed any more, right?

Now our actual problems started to come up linked with class loading. After
restarting rest service first trigger of job is working fine, but next ones
are complaining on class versions that are loaded.
Problematic code was coming in the end from Hadoop SequenceFileReader:


Error: java.io.IOException: wrong key class: com.internal.Meta is not class com.internal.Meta
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1305)


We found out that PackagedProgram is creating new classLoader on every creation. And second run is comparing class loaded by classloader from first run to class loaded by classloader from second run.
So we overriden PackagedProgram behaviour so that we have static map holding one classloader
per jarFileName:


    private static final Map<String, ClassLoader> classLoaders =
Maps.newHashMap();
    ...
    (constructor) {
                ...
            classLoaders.computeIfAbsent(jarFile.getName(),
                s -> getUserClassLoaderParentFirst(getAllLibraries(),
classPaths, getClass().getClassLoader()));

        userCodeClassLoader = classLoaders.get(jarFile.getName());
        this.mainClass = loadMainClass(entryPointClassName,
userCodeClassLoader);

        }


I don't know if that is a good direction, but seems to solve an issue for
now. We are just not sure about stability of this solution - still tesing on
our internal environment but I'm affraid for now to proceed on production.
Can you give us any other things we could try out to deal with loading?
Is there any special way of clearing out classes loaded by first run, so that there are no leftovers in other runs?


Also PackagedProgram is still using parentFirst strategy, in JobWithJars you
have method:


        public static ClassLoader buildUserCodeClassLoader(List<URL> jars,
List<URL> classpaths, ClassLoader parent) {
                URL[] urls = new URL[jars.size() + classpaths.size()];
                for (int i = 0; i < jars.size(); i++) {
                        urls[i] = jars.get(i);
                }
                for (int i = 0; i < classpaths.size(); i++) {
                        urls[i + jars.size()] = classpaths.get(i);
                }
                return FlinkUserCodeClassLoaders.parentFirst(urls, parent);
        }


is that correct to still point to parent?

In some of issues I found in mailing list, you suggest to set up container
to parent-first as a solving issue. We would like to find proper solution
working on supported child-first path and don't use workaround fix.





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Classloading issues after changing to 1.4

Ken Krugler
When we transitioned from 1.3 to 1.4, we ran into some class loader issues.

Though we weren’t using any sophisticated class loader helicopter stunts :)

Specifically…

1. Re-worked our pom.xml to set up shading to better mirror what the 1.4 example pom was doing.

2. Enabled child-first classloading

3. Ensured “hadoop classpath” command returned nothing (or failed), to avoid loading Hadoop jars before our jars (even with #2 above)

— Ken




> On Apr 13, 2018, at 2:28 AM, eSKa <[hidden email]> wrote:
>
> Hello,
> I still have problem after upgrading from flink 1.3.1 to 1.4.2
> Our scenario looks like that:
> we have container running on top of yarn. Machine that starts it has
> installed flink and also loading some classpath libraries (e.g. hadoop) into
> container.
> there is seperate rest service that gets requests to run export job - it
> uses YarnClusterClient and submitting Packaged program. Jars with actual
> flink jobs are located in lib/ directory of service. On the machine where
> Spring service is deployed we don't have flink installed.
> For version 1.3 we had some libraries also loaded to container so that they
> wont have to be loaded dynamically every time. If I understand it correctly
> in strategy child-first it should not be needed any more, right?
>
> Now our actual problems started to come up linked with class loading. After
> restarting rest service first trigger of job is working fine, but next ones
> are complaining on class versions that are loaded. We found out that
> PackagedProgram is creating new classLoader on every creation. So we
> overriden that behaviour so that we have static map holding one classloader
> per jarFileName:
>
>
>    /private static final Map<String, ClassLoader> classLoaders =
> Maps.newHashMap();
>    ...
>    (constructor) {
> ...
>    classLoaders.computeIfAbsent(jarFile.getName(),
>                s -> getUserClassLoaderChildFirst(getAllLibraries(),
> classPaths, getClass().getClassLoader()));
>
>        userCodeClassLoader = classLoaders.get(jarFile.getName());
>        this.mainClass = loadMainClass(entryPointClassName,
> userCodeClassLoader);
>
>        }
> /
>
> I don't know if that is a good direction, but seems to solve an issue for
> now. We are just not sure about stability of this solution - still tesing on
> our internal environment but I'm affraid for now to proceed on production.
> Can you give us any other things we could try out to deal with loading?
>
>
> Also PackagedProgram is still using parentFirst strategy, in JobWithJars you
> have method:
> /
>
> public static ClassLoader buildUserCodeClassLoader(List<URL> jars,
> List<URL> classpaths, ClassLoader parent) {
> URL[] urls = new URL[jars.size() + classpaths.size()];
> for (int i = 0; i < jars.size(); i++) {
> urls[i] = jars.get(i);
> }
> for (int i = 0; i < classpaths.size(); i++) {
> urls[i + jars.size()] = classpaths.get(i);
> }
> return FlinkUserCodeClassLoaders.parentFirst(urls, parent);
> }
> /
>
> is that correct to still point to parent?
>
> In some of issues I found in mailing list, you suggest to set up container
> to parent-first as a solving issue. We would like to find proper solution
> working on supported child-first path and don't use workaround fix.
>
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378