JVM crash - SIGSEGV in ZIP_GetEntry

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

JVM crash - SIGSEGV in ZIP_GetEntry

Dawid Wysakowicz
Hi,

Recently we observe regular taskmanager's JVM crashes just about a minute from the start of our Flink job. We run flink 1.3.2 on YARN (2.6.2.0-205). Java version:

JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build 1.8.0_112-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode linux-amd64 compressed oops)

Any help with this problem would be appreciated. If you need any more info I will be happy to provide it.
JVM crashes with SIGSEGV. Please see top of the stacktrace attached:

Stack: [0x00007f301a1d9000,0x00007f301a2da000],  sp=0x00007f301a2d6090,  free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x8dfada]  Monitor::jvm_raw_lock()+0xa
V  [libjvm.so+0x70fe17]  JVM_RawMonitorEnter+0x27
C  [libzip.so+0x120f1]  ZIP_GetEntry2+0x61
C  [libzip.so+0x3ec0]  Java_java_util_zip_ZipFile_getEntry+0xf0
J 136  java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007f303c314c0e [0x00007f303c314b40+0xce]
J 1579 C2 java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry; (9 bytes) @ 0x00007f303c735db8 [0x00007f303c735a40+0x378]
J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
J 2318 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; (7 bytes) @ 0x00007f303c96db80 [0x00007f303c96d9c0+0x1c0]
j  com.esotericsoftware.reflectasm.ConstructorAccess.get(Ljava/lang/Class;)Lcom/esotericsoftware/reflectasm/ConstructorAccess;+109
j  com.twitter.chill.Instantiators$.reflectAsm(Ljava/lang/Class;)Lscala/util/Either;+1
j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Class;)Lscala/util/Either;+4
j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Lscala/Function1;)Lscala/Option;+5
j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
j  scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13
j  scala.collection.Iterator$class.find(Lscala/collection/Iterator;Lscala/Function1;)Lscala/Option;+21
j  scala.collection.AbstractIterator.find(Lscala/Function1;)Lscala/Option;+2
j  com.twitter.chill.Instantiators$.newOrElse(Ljava/lang/Class;Lscala/collection/TraversableOnce;Lscala/Function0;)Lorg/objenesis/instantiator/ObjectInstantiator;+25
j  com.twitter.chill.KryoBase.newInstantiator(Ljava/lang/Class;)Lorg/objenesis/instantiator/ObjectInstantiator;+54
J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db5656c [0x00007f303db56160+0x40c]
J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
j  com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField.copy(Ljava/lang/Object;Ljava/lang/Object;)V+34
J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db566c8 [0x00007f303db56160+0x568]
J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
j  org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+15
j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Lorg/apache/flink/api/java/tuple/Tuple;)Lorg/apache/flink/api/java/tuple/Tuple;+26
j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+5
j  org.apache.flink.runtime.state.ArrayListSerializer.copy(Ljava/util/ArrayList;)Ljava/util/ArrayList;+51
j  org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.<init>(Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;)V+13
j org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.deepCopy()Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;+5
j org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(JJLorg/apache/flink/runtime/state/CheckpointStreamFactory;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Ljava/util/concurrent/RunnableFuture;+115
j org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(JJLorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Lorg/apache/flink/streaming/api/operators/OperatorSnapshotResult;+111
j org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(Lorg/apache/flink/streaming/api/operators/StreamOperator;)V+58
j  org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing()V+35
j org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+15
j org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)Z+74
j org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+4
j  org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;)V+73
j  org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;I)V+193
J 11295 C2 org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput()Z (602 bytes) @ 0x00007f303de7beb4 [0x00007f303de79c00+0x22b4]
J 10764% C2 org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run()V (23 bytes) @ 0x00007f303cb3812c [0x00007f303cb38080+0xac]
j  org.apache.flink.streaming.runtime.tasks.StreamTask.invoke()V+221
j  org.apache.flink.runtime.taskmanager.Task.run()V+813
j  java.lang.Thread.run()V+11

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JVM crash - SIGSEGV in ZIP_GetEntry

Dawid Wysakowicz
Just as a follow-up I tried disabling mmap with sun.zip.disableMemoryMapping, but it did not help. This time I got only Java stack:

Stack: [0x00007f9060757000,0x00007f9060858000],  sp=0x00007f9060856350,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

[error occurred during error reporting (printing native stack), id 0xb]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  java.util.zip.Inflater.end(J)V+0
j  java.util.zip.Inflater.end()V+29
j  java.util.zip.ZipFile.close()V+169
j  sun.net.www.protocol.jar.URLJarFile.close()V+18
j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
J 10563% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes) @ 0x00007f9075be90b4 [0x00007f9075be8e00+0x2b4]
v  ~StubRoutines::call_stub

> On 17 Dec 2017, at 15:03, Dawid Wysakowicz <[hidden email]> wrote:
>
> Hi,
>
> Recently we observe regular taskmanager's JVM crashes just about a minute from the start of our Flink job. We run flink 1.3.2 on YARN (2.6.2.0-205). Java version:
>
> JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build 1.8.0_112-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode linux-amd64 compressed oops)
>
> Any help with this problem would be appreciated. If you need any more info I will be happy to provide it.
> JVM crashes with SIGSEGV. Please see top of the stacktrace attached:
>
> Stack: [0x00007f301a1d9000,0x00007f301a2da000],  sp=0x00007f301a2d6090,  free space=1012k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0x8dfada]  Monitor::jvm_raw_lock()+0xa
> V  [libjvm.so+0x70fe17]  JVM_RawMonitorEnter+0x27
> C  [libzip.so+0x120f1]  ZIP_GetEntry2+0x61
> C  [libzip.so+0x3ec0]  Java_java_util_zip_ZipFile_getEntry+0xf0
> J 136  java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007f303c314c0e [0x00007f303c314b40+0xce]
> J 1579 C2 java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry; (9 bytes) @ 0x00007f303c735db8 [0x00007f303c735a40+0x378]
> J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> J 2318 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; (7 bytes) @ 0x00007f303c96db80 [0x00007f303c96d9c0+0x1c0]
> j  com.esotericsoftware.reflectasm.ConstructorAccess.get(Ljava/lang/Class;)Lcom/esotericsoftware/reflectasm/ConstructorAccess;+109
> j  com.twitter.chill.Instantiators$.reflectAsm(Ljava/lang/Class;)Lscala/util/Either;+1
> j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Class;)Lscala/util/Either;+4
> j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Lscala/Function1;)Lscala/Option;+5
> j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13
> j  scala.collection.Iterator$class.find(Lscala/collection/Iterator;Lscala/Function1;)Lscala/Option;+21
> j  scala.collection.AbstractIterator.find(Lscala/Function1;)Lscala/Option;+2
> j  com.twitter.chill.Instantiators$.newOrElse(Ljava/lang/Class;Lscala/collection/TraversableOnce;Lscala/Function0;)Lorg/objenesis/instantiator/ObjectInstantiator;+25
> j  com.twitter.chill.KryoBase.newInstantiator(Ljava/lang/Class;)Lorg/objenesis/instantiator/ObjectInstantiator;+54
> J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db5656c [0x00007f303db56160+0x40c]
> J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> j  com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField.copy(Ljava/lang/Object;Ljava/lang/Object;)V+34
> J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db566c8 [0x00007f303db56160+0x568]
> J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> j  org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+15
> j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Lorg/apache/flink/api/java/tuple/Tuple;)Lorg/apache/flink/api/java/tuple/Tuple;+26
> j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  org.apache.flink.runtime.state.ArrayListSerializer.copy(Ljava/util/ArrayList;)Ljava/util/ArrayList;+51
> j  org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.<init>(Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;)V+13
> j org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.deepCopy()Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;+5
> j org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(JJLorg/apache/flink/runtime/state/CheckpointStreamFactory;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Ljava/util/concurrent/RunnableFuture;+115
> j org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(JJLorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Lorg/apache/flink/streaming/api/operators/OperatorSnapshotResult;+111
> j org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(Lorg/apache/flink/streaming/api/operators/StreamOperator;)V+58
> j  org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing()V+35
> j org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+15
> j org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)Z+74
> j org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+4
> j  org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;)V+73
> j  org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;I)V+193
> J 11295 C2 org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput()Z (602 bytes) @ 0x00007f303de7beb4 [0x00007f303de79c00+0x22b4]
> J 10764% C2 org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run()V (23 bytes) @ 0x00007f303cb3812c [0x00007f303cb38080+0xac]
> j  org.apache.flink.streaming.runtime.tasks.StreamTask.invoke()V+221
> j  org.apache.flink.runtime.taskmanager.Task.run()V+813
> j  java.lang.Thread.run()V+11


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JVM crash - SIGSEGV in ZIP_GetEntry

Gyula Fóra

Hi,
I have seen similar errors when trying to serialize Kryo-typeserializers with Flink type infos accidentally.

Maybe that helps :)

Gyula


On Sun, Dec 17, 2017, 15:52 Dawid Wysakowicz <[hidden email]> wrote:
Just as a follow-up I tried disabling mmap with sun.zip.disableMemoryMapping, but it did not help. This time I got only Java stack:

Stack: [0x00007f9060757000,0x00007f9060858000],  sp=0x00007f9060856350,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

[error occurred during error reporting (printing native stack), id 0xb]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  java.util.zip.Inflater.end(J)V+0
j  java.util.zip.Inflater.end()V+29
j  java.util.zip.ZipFile.close()V+169
j  sun.net.www.protocol.jar.URLJarFile.close()V+18
j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
J 10563% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes) @ 0x00007f9075be90b4 [0x00007f9075be8e00+0x2b4]
v  ~StubRoutines::call_stub

> On 17 Dec 2017, at 15:03, Dawid Wysakowicz <[hidden email]> wrote:
>
> Hi,
>
> Recently we observe regular taskmanager's JVM crashes just about a minute from the start of our Flink job. We run flink 1.3.2 on YARN (2.6.2.0-205). Java version:
>
> JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build 1.8.0_112-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode linux-amd64 compressed oops)
>
> Any help with this problem would be appreciated. If you need any more info I will be happy to provide it.
> JVM crashes with SIGSEGV. Please see top of the stacktrace attached:
>
> Stack: [0x00007f301a1d9000,0x00007f301a2da000],  sp=0x00007f301a2d6090,  free space=1012k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0x8dfada]  Monitor::jvm_raw_lock()+0xa
> V  [libjvm.so+0x70fe17]  JVM_RawMonitorEnter+0x27
> C  [libzip.so+0x120f1]  ZIP_GetEntry2+0x61
> C  [libzip.so+0x3ec0]  Java_java_util_zip_ZipFile_getEntry+0xf0
> J 136  java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007f303c314c0e [0x00007f303c314b40+0xce]
> J 1579 C2 java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry; (9 bytes) @ 0x00007f303c735db8 [0x00007f303c735a40+0x378]
> J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> J 2318 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; (7 bytes) @ 0x00007f303c96db80 [0x00007f303c96d9c0+0x1c0]
> j  com.esotericsoftware.reflectasm.ConstructorAccess.get(Ljava/lang/Class;)Lcom/esotericsoftware/reflectasm/ConstructorAccess;+109
> j  com.twitter.chill.Instantiators$.reflectAsm(Ljava/lang/Class;)Lscala/util/Either;+1
> j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Class;)Lscala/util/Either;+4
> j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Lscala/Function1;)Lscala/Option;+5
> j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13
> j  scala.collection.Iterator$class.find(Lscala/collection/Iterator;Lscala/Function1;)Lscala/Option;+21
> j  scala.collection.AbstractIterator.find(Lscala/Function1;)Lscala/Option;+2
> j  com.twitter.chill.Instantiators$.newOrElse(Ljava/lang/Class;Lscala/collection/TraversableOnce;Lscala/Function0;)Lorg/objenesis/instantiator/ObjectInstantiator;+25
> j  com.twitter.chill.KryoBase.newInstantiator(Ljava/lang/Class;)Lorg/objenesis/instantiator/ObjectInstantiator;+54
> J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db5656c [0x00007f303db56160+0x40c]
> J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> j  com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField.copy(Ljava/lang/Object;Ljava/lang/Object;)V+34
> J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db566c8 [0x00007f303db56160+0x568]
> J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> j  org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+15
> j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Lorg/apache/flink/api/java/tuple/Tuple;)Lorg/apache/flink/api/java/tuple/Tuple;+26
> j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+5
> j  org.apache.flink.runtime.state.ArrayListSerializer.copy(Ljava/util/ArrayList;)Ljava/util/ArrayList;+51
> j  org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.<init>(Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;)V+13
> j org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.deepCopy()Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;+5
> j org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(JJLorg/apache/flink/runtime/state/CheckpointStreamFactory;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Ljava/util/concurrent/RunnableFuture;+115
> j org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(JJLorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Lorg/apache/flink/streaming/api/operators/OperatorSnapshotResult;+111
> j org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(Lorg/apache/flink/streaming/api/operators/StreamOperator;)V+58
> j  org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing()V+35
> j org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+15
> j org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)Z+74
> j org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+4
> j  org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;)V+73
> j  org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;I)V+193
> J 11295 C2 org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput()Z (602 bytes) @ 0x00007f303de7beb4 [0x00007f303de79c00+0x22b4]
> J 10764% C2 org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run()V (23 bytes) @ 0x00007f303cb3812c [0x00007f303cb38080+0xac]
> j  org.apache.flink.streaming.runtime.tasks.StreamTask.invoke()V+221
> j  org.apache.flink.runtime.taskmanager.Task.run()V+813
> j  java.lang.Thread.run()V+11

Reply | Threaded
Open this post in threaded view
|

Re: JVM crash - SIGSEGV in ZIP_GetEntry

Dawid Wysakowicz
Thanks Gyula,

It kind of helped. I did remove some KryoSerializers here and there and it started working, but don’t understand it fully. Will try to understand and reproduce it, as soon as I have some spare time.

> On 17 Dec 2017, at 17:52, Gyula Fóra <[hidden email]> wrote:
>
> Hi,
> I have seen similar errors when trying to serialize Kryo-typeserializers with Flink type infos accidentally.
>
> Maybe that helps :)
>
> Gyula
>
>
> On Sun, Dec 17, 2017, 15:52 Dawid Wysakowicz <[hidden email]> wrote:
> Just as a follow-up I tried disabling mmap with sun.zip.disableMemoryMapping, but it did not help. This time I got only Java stack:
>
> Stack: [0x00007f9060757000,0x00007f9060858000],  sp=0x00007f9060856350,  free space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>
> [error occurred during error reporting (printing native stack), id 0xb]
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  java.util.zip.Inflater.end(J)V+0
> j  java.util.zip.Inflater.end()V+29
> j  java.util.zip.ZipFile.close()V+169
> j  sun.net.www.protocol.jar.URLJarFile.close()V+18
> j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
> J 10563% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes) @ 0x00007f9075be90b4 [0x00007f9075be8e00+0x2b4]
> v  ~StubRoutines::call_stub
>
> > On 17 Dec 2017, at 15:03, Dawid Wysakowicz <[hidden email]> wrote:
> >
> > Hi,
> >
> > Recently we observe regular taskmanager's JVM crashes just about a minute from the start of our Flink job. We run flink 1.3.2 on YARN (2.6.2.0-205). Java version:
> >
> > JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build 1.8.0_112-b15)
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode linux-amd64 compressed oops)
> >
> > Any help with this problem would be appreciated. If you need any more info I will be happy to provide it.
> > JVM crashes with SIGSEGV. Please see top of the stacktrace attached:
> >
> > Stack: [0x00007f301a1d9000,0x00007f301a2da000],  sp=0x00007f301a2d6090,  free space=1012k
> > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> > V  [libjvm.so+0x8dfada]  Monitor::jvm_raw_lock()+0xa
> > V  [libjvm.so+0x70fe17]  JVM_RawMonitorEnter+0x27
> > C  [libzip.so+0x120f1]  ZIP_GetEntry2+0x61
> > C  [libzip.so+0x3ec0]  Java_java_util_zip_ZipFile_getEntry+0xf0
> > J 136  java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007f303c314c0e [0x00007f303c314b40+0xce]
> > J 1579 C2 java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry; (9 bytes) @ 0x00007f303c735db8 [0x00007f303c735a40+0x378]
> > J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> > v  ~StubRoutines::call_stub
> > V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> > V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> > J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> > J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> > J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> > j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> > J 2321 C2 java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007f303ca5965c [0x00007f303ca59080+0x5dc]
> > v  ~StubRoutines::call_stub
> > V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
> > V  [libjvm.so+0x729f2c]  JVM_DoPrivileged+0x27c
> > J 308  java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007f303c38dd15 [0x00007f303c38dc40+0xd5]
> > J 2991 C2 java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; (47 bytes) @ 0x00007f303c30f430 [0x00007f303c30f3a0+0x90]
> > J 4911 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; (122 bytes) @ 0x00007f303cd178f8 [0x00007f303cd16600+0x12f8]
> > j  com.esotericsoftware.reflectasm.AccessClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+48
> > J 2318 C2 java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; (7 bytes) @ 0x00007f303c96db80 [0x00007f303c96d9c0+0x1c0]
> > j  com.esotericsoftware.reflectasm.ConstructorAccess.get(Ljava/lang/Class;)Lcom/esotericsoftware/reflectasm/ConstructorAccess;+109
> > j  com.twitter.chill.Instantiators$.reflectAsm(Ljava/lang/Class;)Lscala/util/Either;+1
> > j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Class;)Lscala/util/Either;+4
> > j  com.twitter.chill.KryoBase$$anonfun$newInstantiator$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> > j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Lscala/Function1;)Lscala/Option;+5
> > j  com.twitter.chill.Instantiators$$anonfun$newOrElse$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
> > j  scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13
> > j  scala.collection.Iterator$class.find(Lscala/collection/Iterator;Lscala/Function1;)Lscala/Option;+21
> > j  scala.collection.AbstractIterator.find(Lscala/Function1;)Lscala/Option;+2
> > j  com.twitter.chill.Instantiators$.newOrElse(Ljava/lang/Class;Lscala/collection/TraversableOnce;Lscala/Function0;)Lorg/objenesis/instantiator/ObjectInstantiator;+25
> > j  com.twitter.chill.KryoBase.newInstantiator(Ljava/lang/Class;)Lorg/objenesis/instantiator/ObjectInstantiator;+54
> > J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db5656c [0x00007f303db56160+0x40c]
> > J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> > j  com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField.copy(Ljava/lang/Object;Ljava/lang/Object;)V+34
> > J 11193 C2 com.esotericsoftware.kryo.serializers.FieldSerializer.copy(Lcom/esotericsoftware/kryo/Kryo;Ljava/lang/Object;)Ljava/lang/Object; (91 bytes) @ 0x00007f303db566c8 [0x00007f303db56160+0x568]
> > J 6855 C2 com.esotericsoftware.kryo.Kryo.copy(Ljava/lang/Object;)Ljava/lang/Object; (211 bytes) @ 0x00007f303d456494 [0x00007f303d4560c0+0x3d4]
> > j  org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+15
> > j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Lorg/apache/flink/api/java/tuple/Tuple;)Lorg/apache/flink/api/java/tuple/Tuple;+26
> > j  org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(Ljava/lang/Object;)Ljava/lang/Object;+5
> > j  org.apache.flink.runtime.state.ArrayListSerializer.copy(Ljava/util/ArrayList;)Ljava/util/ArrayList;+51
> > j  org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.<init>(Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;)V+13
> > j org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.deepCopy()Lorg/apache/flink/runtime/state/DefaultOperatorStateBackend$PartitionableListState;+5
> > j org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(JJLorg/apache/flink/runtime/state/CheckpointStreamFactory;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Ljava/util/concurrent/RunnableFuture;+115
> > j org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(JJLorg/apache/flink/runtime/checkpoint/CheckpointOptions;)Lorg/apache/flink/streaming/api/operators/OperatorSnapshotResult;+111
> > j org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(Lorg/apache/flink/streaming/api/operators/StreamOperator;)V+58
> > j  org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing()V+35
> > j org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+15
> > j org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)Z+74
> > j org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(Lorg/apache/flink/runtime/checkpoint/CheckpointMetaData;Lorg/apache/flink/runtime/checkpoint/CheckpointOptions;Lorg/apache/flink/runtime/checkpoint/CheckpointMetrics;)V+4
> > j  org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;)V+73
> > j  org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(Lorg/apache/flink/runtime/io/network/api/CheckpointBarrier;I)V+193
> > J 11295 C2 org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput()Z (602 bytes) @ 0x00007f303de7beb4 [0x00007f303de79c00+0x22b4]
> > J 10764% C2 org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run()V (23 bytes) @ 0x00007f303cb3812c [0x00007f303cb38080+0xac]
> > j  org.apache.flink.streaming.runtime.tasks.StreamTask.invoke()V+221
> > j  org.apache.flink.runtime.taskmanager.Task.run()V+813
> > j  java.lang.Thread.run()V+11
>


signature.asc (849 bytes) Download Attachment