Flink threads cpu busy/stuck at NoFetchingInput.require (Versions 1.2 and 1.4)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Flink threads cpu busy/stuck at NoFetchingInput.require (Versions 1.2 and 1.4)

Mehar Simhadri (mesimhad)

Hi Flink Community,

 

In our flink deployments, we see some flink threads are cpu busy/stuck after few hours with the below stack

 

"Sink:AggregationSink (2/4)" #567 daemon prio=5 os_prio=0 tid=0x00007f901dc97000 nid=0x254 runnable [0x00007f8fe017f000]

java.lang.Thread.State: RUNNABLE

at org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:70)

at com.esotericsoftware.kryo.io.Input.fill(Input.java:146)

at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:76)

at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355)

at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)

at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)

at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)

at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:236)

at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:187)

at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:40)

at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)

at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:109)

at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:147)

at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)

at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:261)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:656)

at java.lang.Thread.run(Thread.java:745)

 

Few observations from the stack, while it is stuck.

SpillingAdaptiveSpanningRecordDeserializer.getNextRecord

nonSpanningRemaining = 13

 

All these 13 bytes were read even before reaching com.esotericsoftware.kryo.io.Input.readVarInt

 

We are wondering is this a serialization bug or memory segment corruption?

 

Any pointers on how to debug further will be much appreciated.

 

Regards,

Mehar