Flink 1.3 - Checkpointing failing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.3 - Checkpointing failing

MAHESH KUMAR
Hi Team,

We have some test cases written using StreamingMultipleProgramsTestBase
It was working fine in version 1.2, we get the following error in version 1.3
It seems like CheckpointCoordinator fails after this error and Checkpointing no longer occurs.

I came across this bug: https://issues.apache.org/jira/browse/FLINK-5462 , It looks kind of similar but I am not exactly sure.

2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | org.apache.flink.runtime.executiongraph.ExecutionGraph  | Could not restart the job pipeline_message_auditor (f54182ae17352efb9aa40667c283ce09) because the restart strategy prevented it.
org.apache.flink.streaming.runtime.tasks.AsynchronousException: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:963) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
... 6 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future.
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1018) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:957) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
... 7 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155)
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39)
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389)
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893)
... 5 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) ~[flink-scala_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) ~[flink-core-1.3.0.jar:1.3.0]
... 6 common frames omitted
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.apache.flink.runtime.checkpoint.CheckpointCoordinator | Stopping checkpoint coordinator for job f54182ae17352efb9aa40667c283ce09
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.a.f.r.checkpoint.StandaloneCompletedCheckpointStore   | Shutting down


Thanks and Regards,
Mahesh


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.3 - Checkpointing failing

Ted Yu
Your case doesn't seem like FLINK-5462 since there was no CancellationException in the stack trace you posted.

The exception from TraversableSerializer.snapshotConfiguration() was added by FLINK-6178

FYI

On Fri, Jun 2, 2017 at 4:04 PM, MAHESH KUMAR <[hidden email]> wrote:
Hi Team,

We have some test cases written using StreamingMultipleProgramsTestBase
It was working fine in version 1.2, we get the following error in version 1.3
It seems like CheckpointCoordinator fails after this error and Checkpointing no longer occurs.

I came across this bug: https://issues.apache.org/jira/browse/FLINK-5462 , It looks kind of similar but I am not exactly sure.

2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | org.apache.flink.runtime.executiongraph.ExecutionGraph  | Could not restart the job pipeline_message_auditor (f54182ae17352efb9aa40667c283ce09) because the restart strategy prevented it.
org.apache.flink.streaming.runtime.tasks.AsynchronousException: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:963) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
... 6 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future.
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1018) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:957) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
... 7 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155)
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39)
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389)
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893)
... 5 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) ~[flink-scala_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) ~[flink-core-1.3.0.jar:1.3.0]
... 6 common frames omitted
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.apache.flink.runtime.checkpoint.CheckpointCoordinator | Stopping checkpoint coordinator for job f54182ae17352efb9aa40667c283ce09
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.a.f.r.checkpoint.StandaloneCompletedCheckpointStore   | Shutting down


Thanks and Regards,
Mahesh



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.3 - Checkpointing failing

Ted Yu
If I read CompositeTypeSerializerConfigSnapshot ctor correctly:

    for (TypeSerializer<?> nestedSerializer : nestedSerializers) {
      TypeSerializerConfigSnapshot configSnapshot = nestedSerializer.snapshotConfiguration();
      this.nestedSerializersAndConfigs.add(

The UnsupportedOperationException thrown by snapshotConfiguration() should be caught without proceeding to nestedSerializersAndConfigs.add(). 

On Fri, Jun 2, 2017 at 7:02 PM, Ted Yu <[hidden email]> wrote:
Your case doesn't seem like FLINK-5462 since there was no CancellationException in the stack trace you posted.

The exception from TraversableSerializer.snapshotConfiguration() was added by FLINK-6178

FYI

On Fri, Jun 2, 2017 at 4:04 PM, MAHESH KUMAR <[hidden email]> wrote:
Hi Team,

We have some test cases written using StreamingMultipleProgramsTestBase
It was working fine in version 1.2, we get the following error in version 1.3
It seems like CheckpointCoordinator fails after this error and Checkpointing no longer occurs.

I came across this bug: https://issues.apache.org/jira/browse/FLINK-5462 , It looks kind of similar but I am not exactly sure.

2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | org.apache.flink.runtime.executiongraph.ExecutionGraph  | Could not restart the job pipeline_message_auditor (f54182ae17352efb9aa40667c283ce09) because the restart strategy prevented it.
org.apache.flink.streaming.runtime.tasks.AsynchronousException: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:963) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
... 6 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future.
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1018) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:957) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
... 7 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155)
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39)
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389)
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893)
... 5 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) ~[flink-scala_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) ~[flink-core-1.3.0.jar:1.3.0]
... 6 common frames omitted
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.apache.flink.runtime.checkpoint.CheckpointCoordinator | Stopping checkpoint coordinator for job f54182ae17352efb9aa40667c283ce09
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.a.f.r.checkpoint.StandaloneCompletedCheckpointStore   | Shutting down


Thanks and Regards,
Mahesh




Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.3 - Checkpointing failing

Tzu-Li (Gordon) Tai
Hi Mahesh,

Thanks a lot for reporting this. This would be a bug: https://issues.apache.org/jira/browse/FLINK-6844.
Apparently the TraversableSerializer could take part in checkpointing and therefore should implement the new compatibility methods.

I’ll make sure that the fix for this gets into 1.3.1.

Cheers,
Gordon

On 3 June 2017 at 5:41:12 AM, Ted Yu ([hidden email]) wrote:

If I read CompositeTypeSerializerConfigSnapshot ctor correctly:

    for (TypeSerializer<?> nestedSerializer : nestedSerializers) {
      TypeSerializerConfigSnapshot configSnapshot = nestedSerializer.snapshotConfiguration();
      this.nestedSerializersAndConfigs.add(

The UnsupportedOperationException thrown by snapshotConfiguration() should be caught without proceeding to nestedSerializersAndConfigs.add(). 

On Fri, Jun 2, 2017 at 7:02 PM, Ted Yu <[hidden email]> wrote:
Your case doesn't seem like FLINK-5462 since there was no CancellationException in the stack trace you posted.

The exception from TraversableSerializer.snapshotConfiguration() was added by FLINK-6178

FYI

On Fri, Jun 2, 2017 at 4:04 PM, MAHESH KUMAR <[hidden email]> wrote:
Hi Team,

We have some test cases written using StreamingMultipleProgramsTestBase
It was working fine in version 1.2, we get the following error in version 1.3
It seems like CheckpointCoordinator fails after this error and Checkpointing no longer occurs.

I came across this bug: https://issues.apache.org/jira/browse/FLINK-5462 , It looks kind of similar but I am not exactly sure.

2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | org.apache.flink.runtime.executiongraph.ExecutionGraph  | Could not restart the job pipeline_message_auditor (f54182ae17352efb9aa40667c283ce09) because the restart strategy prevented it.
org.apache.flink.streaming.runtime.tasks.AsynchronousException: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:963) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator TriggerWindow(TumblingProcessingTimeWindows(4000), ReducingStateDescriptor{serializer=com.oracle.ci.flink.streaming.MessageAuditorStreamingHelper$$anonfun$1$$anon$7$$anon$2@e42b922d, reduceFunction=org.apache.flink.streaming.api.scala.function.util.ScalaReduceFunction@59623f44}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:301)) -> (Map -> Sink: auditor_out-kafkaSink, Map -> Sink: auditor_expire-kafkaSink) (1/1).
... 6 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_112]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future.
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1018) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:957) ~[flink-streaming-java_2.11-1.3.0.jar:1.3.0]
... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
... 7 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155)
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39)
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389)
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:893)
... 5 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) ~[flink-scala_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.<init>(CompositeTypeSerializerConfigSnapshot.java:53) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.<init>(TupleSerializerConfigSnapshot.java:45) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) ~[flink-core-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:591) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:510) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:407) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:389) ~[flink-statebackend-rocksdb_2.11-1.3.0.jar:1.3.0]
at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) ~[flink-runtime_2.11-1.3.0.jar:1.3.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112]
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) ~[flink-core-1.3.0.jar:1.3.0]
... 6 common frames omitted
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.apache.flink.runtime.checkpoint.CheckpointCoordinator | Stopping checkpoint coordinator for job f54182ae17352efb9aa40667c283ce09
2017-06-02 16:11:07,048  INFO | flink-akka.actor.default-dispatcher-3 | o.a.f.r.checkpoint.StandaloneCompletedCheckpointStore   | Shutting down


Thanks and Regards,
Mahesh