NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

pwestermann

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

Xintong Song
Hi Peter,

Thanks for reporting this issue.

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?

Thank you~

Xintong Song



On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

pwestermann

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

rmetzger0
Hi Peter,
how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?
Can you make sure that the TaskManagers are also running Flink 1.11.1?

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

pwestermann

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

rmetzger0
Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) )

I'm running JDK 1.8, Scala 2.12 build, vanilla Flink:

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)
2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:
2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir
2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf
2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode
2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster
2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar:::


Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet).
Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?)
Did you do any other modifications?


On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

pwestermann

Hi Robert,

 

I think this may have something to do with the HA setup: looks like the exceptions only show up when not on the leader.

I just spun up a new cluster to provide logs and didn’t get any errors when looking at task managers on the current leader but as soon as I look at the UI on the standby backup I get these exceptions. I attached the log for the standby jobmanager.

 

Thanks for your help,

 

Peter

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 10:42 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) )

 

I'm running JDK 1.8, Scala 2.12 build, vanilla Flink:

 

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar:::

 

 

Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet).

Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?)

Did you do any other modifications?

 

 

On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter


flink.log (75K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

Till Rohrmann
The problem is that `ResourceProfileInfo` is not serializable. When requesting the information from the leading web server then there is no serialization required since the leading RM is most likely co-located in the same process. I've opened an issue [1] and PR [2] for it.


On Fri, Jul 24, 2020 at 5:43 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

I think this may have something to do with the HA setup: looks like the exceptions only show up when not on the leader.

I just spun up a new cluster to provide logs and didn’t get any errors when looking at task managers on the current leader but as soon as I look at the UI on the standby backup I get these exceptions. I attached the log for the standby jobmanager.

 

Thanks for your help,

 

Peter

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 10:42 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) )

 

I'm running JDK 1.8, Scala 2.12 build, vanilla Flink:

 

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar:::

 

 

Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet).

Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?)

Did you do any other modifications?

 

 

On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

pwestermann

Thank you Till!

 

From: Till Rohrmann <[hidden email]>
Date: Friday, July 24, 2020 at 2:49 PM
To: Peter Westermann <[hidden email]>
Cc: Robert Metzger <[hidden email]>, Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

The problem is that `ResourceProfileInfo` is not serializable. When requesting the information from the leading web server then there is no serialization required since the leading RM is most likely co-located in the same process. I've opened an issue [1] and PR [2] for it.

 

 

On Fri, Jul 24, 2020 at 5:43 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

I think this may have something to do with the HA setup: looks like the exceptions only show up when not on the leader.

I just spun up a new cluster to provide logs and didn’t get any errors when looking at task managers on the current leader but as soon as I look at the UI on the standby backup I get these exceptions. I attached the log for the standby jobmanager.

 

Thanks for your help,

 

Peter

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 10:42 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) )

 

I'm running JDK 1.8, Scala 2.12 build, vanilla Flink:

 

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar:::

 

 

Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet).

Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?)

Did you do any other modifications?

 

 

On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter

Reply | Threaded
Open this post in threaded view
|

Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

Till Rohrmann
The fix has been merged into master and the release-1.11 branch. It should be shipped with the next bug fix release 1.11.2.

Cheers,
Till

On Fri, Jul 24, 2020 at 9:02 PM Peter Westermann <[hidden email]> wrote:

Thank you Till!

 

From: Till Rohrmann <[hidden email]>
Date: Friday, July 24, 2020 at 2:49 PM
To: Peter Westermann <[hidden email]>
Cc: Robert Metzger <[hidden email]>, Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

The problem is that `ResourceProfileInfo` is not serializable. When requesting the information from the leading web server then there is no serialization required since the leading RM is most likely co-located in the same process. I've opened an issue [1] and PR [2] for it.

 

 

On Fri, Jul 24, 2020 at 5:43 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

I think this may have something to do with the HA setup: looks like the exceptions only show up when not on the leader.

I just spun up a new cluster to provide logs and didn’t get any errors when looking at task managers on the current leader but as soon as I look at the UI on the standby backup I get these exceptions. I attached the log for the standby jobmanager.

 

Thanks for your help,

 

Peter

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 10:42 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) )

 

I'm running JDK 1.8, Scala 2.12 build, vanilla Flink:

 

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)

2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:

2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster

2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar:::

 

 

Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet).

Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?)

Did you do any other modifications?

 

 

On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:

Hi Robert,

 

Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no lingering jars from older Flink versions.

The only code change is using Flink 1.11.1 instead of 1.10.1.

FWIW: This is with security.ssl.rest.enabled: true if that makes a difference.

 

Thanks,

Peter

 

 

From: Robert Metzger <[hidden email]>
Date: Friday, July 24, 2020 at 8:54 AM
To: Peter Westermann <[hidden email]>
Cc: Xintong Song <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script?

Can you make sure that the TaskManagers are also running Flink 1.11.1?

 

On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:

Hi Xintong Song,

 

This is the UI for a newly started Flink cluster:

 

A screenshot of a cell phone

Description automatically generated

As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh):

A screenshot of a cell phone

Description automatically generated

 

I got the actual error message from the logs.

This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.

 

 

Peter

 

From: Xintong Song <[hidden email]>
Date: Wednesday, July 22, 2020 at 10:10 PM
To: Peter Westermann <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo

 

Hi Peter,

 

Thanks for reporting this issue.

 

From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered?


Thank you~

Xintong Song

 

 

On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:

I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.

The exception in the log is:

j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat j.i.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat java.util.ArrayList.writeObject(ArrayList.java:766)
\tat s.r.GeneratedMethodAccessor22.invoke(Unknown Source)
\tat s.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\tat j.l.reflect.Method.invoke(Method.java:498)
\tat j.i.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
\tat j.i.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
\tat j.i.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
\tat j.i.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
\tat o.a.f.u.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
\tat o.a.f.u.SerializedValue.<init>(SerializedValue.java:52)
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:357)
\t... 29 common frames omitted
Wrapped by: o.a.f.r.r.a.e.AkkaRpcException: Failed to serialize the result for RPC call : requestTaskManagerInfo.
\tat o.a.f.r.r.a.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:368)
\tat o.a.f.r.r.a.AkkaRpcActor.lambda$sendAsyncResponse$0(AkkaRpcActor.java:335)
\tat j.u.c.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
\tat j.u.c.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
\tat j.u.c.CompletableFuture.whenComplete(CompletableFuture.java:2140)
\tat o.a.f.r.r.a.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:329)
\tat o.a.f.r.r.a.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:298)
\tat o.a.f.r.r.a.AkkaRpcActo...

 

 

Peter