I just started testing Flink 1.11.1 and noticed that the Task Managers section in the UI doesn’t load.
The exception in the log is: j.i.NotSerializableException: org.apache.flink.runtime.rest.messages.ResourceProfileInfo Peter |
Hi Peter, Thanks for reporting this issue. From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe screenshots) on how this issue is triggered? Thank you~ Xintong Song On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:
|
Hi Xintong Song, This is the UI for a newly started Flink cluster: As soon as I click on Task Managers, this happens (the same error message pops up on each UI refresh): I got the actual error message from the logs. This is for a Flink cluster on Amazon EC2 with RocksDB as a state backend, state in S3, and zookeeper for HA.
Peter From: Xintong Song <[hidden email]> Hi Peter, Thanks for reporting this issue. From the exception stack, it seems there's indeed a problem. However, I'm not able to reproduce this issue on my machine, and I guess that's why this is not discovered before the release. Could you help share some more details (and maybe
screenshots) on how this issue is triggered?
Thank you~ Xintong Song On Thu, Jul 23, 2020 at 2:07 AM Peter Westermann <[hidden email]> wrote:
|
Hi Peter, how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script? Can you make sure that the TaskManagers are also running Flink 1.11.1? On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:
|
Hi Robert, Jobmanagers and taskmanagers are both running on 1.11.1. Jobmanagers are started with
jobmanager.sh start and taskmanagers are started with taskmanager.sh start – to be clear those run on separate instances. Jars and config are distributed when creating AMIs for these instances – every build starts from scratch so there are no
lingering jars from older Flink versions. The only code change is using Flink 1.11.1 instead of 1.10.1. FWIW: This is with security.ssl.rest.enabled: true if that makes a difference. Thanks, From: Robert Metzger <[hidden email]> Hi Peter, how are you deploying Flink on the EC2 machines? Did you manually distribute the files to the machines, and then use the start-cluster.sh script? Can you make sure that the TaskManagers are also running Flink 1.11.1? On Thu, Jul 23, 2020 at 1:05 PM Peter Westermann <[hidden email]> wrote:
|
Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) ) I'm running JDK 1.8, Scala 2.12 build, vanilla Flink: 2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00) 2020-07-24 16:33:58,416 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: robert 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found> 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 1.8/25.252-b09 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 981 MiBytes 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options: 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments: 2020-07-24 16:33:58,417 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir 2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /private/tmp/flink/flink-1.11.1/conf 2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --executionMode 2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - cluster 2020-07-24 16:33:58,418 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar::: Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet). Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?) Did you do any other modifications? On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:
|
Hi Robert, I think this may have something to do with the HA setup: looks like the exceptions only show up when not on the leader. I just spun up a new cluster to provide logs and didn’t get any errors when looking at task managers on the current leader but as soon as I look at the UI on the standby backup I get these exceptions. I attached the log for the standby
jobmanager. Thanks for your help, Peter From: Robert Metzger <[hidden email]> Thanks for your response. I was able to start Flink 1.11.1 locally (1 JM, 5 TMs) with SSL enabled, but I didn't have this problem (it was also unlikely :) ) I'm running JDK 1.8, Scala 2.12 build, vanilla Flink: 2020-07-24
16:33:58,416
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - Starting
StandaloneSessionClusterEntrypoint (Version:
1.11.1,
Scala:
2.12,
Rev:7eb514a,
Date:2020-07-15T07:02:09+02:00) 2020-07-24
16:33:58,416
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - OS current user: robert 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - Current
Hadoop/Kerberos
user: <no hadoop dependency found> 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - JVM:
OpenJDK
64-Bit
Server
VM -
AdoptOpenJDK -
1.8/25.252-b09 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - Maximum heap size:
981
MiBytes 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - JAVA_HOME: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - No
Hadoop
Dependency available 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - JVM
Options: 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Xmx1073741824 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Xms1073741824 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -XX:MaxMetaspaceSize=268435456 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Dlog.file=/private/tmp/flink/flink-1.11.1/log/flink-robert-standalonesession-0-MacBook-Pro-2.localdomain.log 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Dlog4j.configuration=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Dlog4j.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/log4j.properties 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - -Dlogback.configurationFile=file:/private/tmp/flink/flink-1.11.1/conf/logback.xml 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - Program
Arguments: 2020-07-24
16:33:58,417
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - --configDir 2020-07-24
16:33:58,418
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - /private/tmp/flink/flink-1.11.1/conf 2020-07-24
16:33:58,418
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - --executionMode 2020-07-24
16:33:58,418
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - cluster 2020-07-24
16:33:58,418
INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint
[] - Classpath: /private/tmp/flink/flink-1.11.1/lib/flink-csv-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-json-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-shaded-zookeeper-3.4.14.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table-blink_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-table_2.12-1.11.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-1.2-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-api-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-core-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/log4j-slf4j-impl-2.12.1.jar:/private/tmp/flink/flink-1.11.1/lib/flink-dist_2.12-1.11.1.jar::: Your setup also sounds pretty vanilla, and the error seems to occur even before you submit any job (so the S3 / rocksdb stuff is not loaded / used yet). Are there any clues in the JobManager log? Can you share the full log here? (or with me privately?) Did you do any other modifications? On Fri, Jul 24, 2020 at 3:52 PM Peter Westermann <[hidden email]> wrote:
flink.log (75K) Download Attachment |
The problem is that `ResourceProfileInfo` is not serializable. When requesting the information from the leading web server then there is no serialization required since the leading RM is most likely co-located in the same process. I've opened an issue [1] and PR [2] for it. On Fri, Jul 24, 2020 at 5:43 PM Peter Westermann <[hidden email]> wrote:
|
Thank you Till! From: Till Rohrmann <[hidden email]> The problem is that `ResourceProfileInfo` is not serializable. When requesting the information from the leading web server then there is no serialization required since the leading RM is most likely co-located in the same process. I've
opened an issue [1] and PR [2] for it. On Fri, Jul 24, 2020 at 5:43 PM Peter Westermann <[hidden email]> wrote:
|
The fix has been merged into master and the release-1.11 branch. It should be shipped with the next bug fix release 1.11.2. Cheers, Till On Fri, Jul 24, 2020 at 9:02 PM Peter Westermann <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |