Hi everyone,
we are testing a long-running streaming application, which shares a yarn session with a batch job (containing JDBC(In|Out)putFormat) that is triggered periodically. Unfortunately, the session is dying after a few runs of the batch job. In fact, each run of the batch job kills one task manager due to OOME PermGen: -- 2016-04-14 16:53:55,212 INFO org.apache.flink.runtime.taskmanager.Task - DataSink (org.apache.flink.api.java.io.jdbc.JDBCOutputFormat@787c33b) (1/3) switched to FAILED with exception. java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at oracle.jdbc.driver.OraclePreparedStatement.<clinit>(OraclePreparedStatement.java:102) at oracle.jdbc.driver.T4CDriverExtension.allocatePreparedStatement(T4CDriverExtension.java:67) at oracle.jdbc.driver.PhysicalConnection.prepareStatement(PhysicalConnection.java:3523) at oracle.jdbc.driver.PhysicalConnection.prepareStatement(PhysicalConnection.java:3409) at org.apache.flink.api.java.io.jdbc.JDBCOutputFormat.open(JDBCOutputFormat.java:79) at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:186) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:744) 2016-04-14 16:53:55,489 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in task exception handler java.lang.OutOfMemoryError: PermGen space at sun.misc.Unsafe.defineClass(Native Method) at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396) at java.security.AccessController.doPrivileged(Native Method) at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395) at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113) at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331) at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376) at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:300) at org.apache.flink.runtime.util.SerializedThrowable.<init>(SerializedThrowable.java:83) at org.apache.flink.runtime.taskmanager.TaskExecutionState.<init>(TaskExecutionState.java:108) at org.apache.flink.runtime.taskmanager.TaskExecutionState.<init>(TaskExecutionState.java:78) at org.apache.flink.runtime.taskmanager.Task.notifyObservers(Task.java:865) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:616) at java.lang.Thread.run(Thread.java:744) 2016-04-14 16:53:55,489 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in task exception handler java.lang.OutOfMemoryError: PermGen space at sun.misc.Unsafe.defineClass(Native Method) at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396) at java.security.AccessController.doPrivileged(Native Method) at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395) at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113) at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331) at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376) at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:300) at org.apache.flink.runtime.util.SerializedThrowable.<init>(SerializedThrowable.java:83) at org.apache.flink.runtime.taskmanager.TaskExecutionState.<init>(TaskExecutionState.java:108) at org.apache.flink.runtime.taskmanager.TaskExecutionState.<init>(TaskExecutionState.java:78) at org.apache.flink.runtime.taskmanager.Task.notifyObservers(Task.java:865) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:616) at java.lang.Thread.run(Thread.java:744) -- This problem seems to be reproducible. In the first run it happens towards the end of the job in a JDBCOutputFormat. From then on, an analogous exception is thrown in the JDBCInputFormat, an earlier operator. We suspect there might be a memory leak caused by the Classloader, any ideas? Best regards, Max — Maximilian Bode * Software Consultant * [hidden email] TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 signature.asc (465 bytes) Download Attachment |
Not a solution for your problem,but an alternative, I wrote my own sink function where I handle all sql activities(insert/update/select), used a 3rd lib for connection pooling, the code has been running stable in production without any issue. On Fri, Apr 15, 2016 at 1:41 PM, Maximilian Bode <[hidden email]> wrote:
|
Hi! One thing you could try and do is create a dump of the JVM when it crashes, and have a look at all the classes it has loaded. For these long-running sessions (that share JVMs across jobs) it is important that classes are properly unloaded. If someone keeps holding references to the classes (either the system, the user code, or a library like the JDBC connector lib), then unloading cannot happen. This would be one way to check that. Greetings, Stephan On Fri, Apr 15, 2016 at 10:21 AM, Balaji Rajagopalan <[hidden email]> wrote:
|
In reply to this post by Maximilian Bode
Hi guys,
The problems seems to be caused by a known bug in the Oracle JDBC driver (see here: http://jrfom.com/2015archive/2014/01/08/fu-ojdbc, might be NSFW). We found a workaround that seems to solve our particular problem for the time being. We simply uploaded the oracle jdbc driver jar to flinks lib directory, since doing that everything looks to be running fine. Registering a hook to unload the driver (as is done in the article linked above) seems like a nicer alternative, but would probably require non-trivial modifications to the JDBC[In|Out]putFormat. Maybe the PermGen problem in conjunction with the Oracle JDBC driver is something that could be added to the FAQ for future generations to lose less sleep over ;-) Cheers, Michael On 15.04.2016 10:11, Maximilian Bode
wrote:
Hi everyone, -- Michael Pisula * [hidden email] * +49-174-3180084 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Free forum by Nabble | Edit this page |