Re: StopWithSavepoint() method doesn't work in Java based flink native k8s operator

Posted by Fuyao Li-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/StopWithSavepoint-method-doesn-t-work-in-Java-based-flink-native-k8s-operator-tp43404p43406.html

Hello Community, Yang,

 

I have one more question for logging. I also noticed that if I execute kubectl logs  command to the JM. The pods provisioned by the operator can’t print out the internal Flink logs in the kubectl logs. I can only get something like the logs below. No actual flink logs is printed here… Where can I find the path to the logs? Maybe use a sidecar container to get it out? How can I get the logs without checking the Flink WebUI? Also, the sed error makes me confused here. In fact, the application is already up and running correctly if I access the WebUI through Ingress.

 

Reference: https://github.com/wangyang0918/flink-native-k8s-operator/issues/4

 

 

[root@bastion deploy]# kubectl logs -f flink-demo-594946fd7b-822xk

 

sed: couldn't open temporary file /opt/flink/conf/sedh1M3oO: Read-only file system

sed: couldn't open temporary file /opt/flink/conf/sed8TqlNR: Read-only file system

/docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only file system

sed: couldn't open temporary file /opt/flink/conf/sedvO2DFU: Read-only file system

/docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only file system

/docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system

Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx3462817376 -Xms3462817376 -XX:MaxMetaspaceSize=268435456 org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=429496736b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=3462817376b -D jobmanager.memory.jvm-overhead.max=429496736b

ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to field java.util.Properties.serialVersionUID

WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner

WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

WARNING: All illegal access operations will be denied in a future release

 

 

-------- The logs stops here, flink applications logs doesn’t get printed here anymore---------

 

^C

[root@bastion deploy]# kubectl logs -f flink-demo-taskmanager-1-1

sed: couldn't open temporary file /opt/flink/conf/sedaNDoNR: Read-only file system

sed: couldn't open temporary file /opt/flink/conf/seddze7tQ: Read-only file system

/docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only file system

sed: couldn't open temporary file /opt/flink/conf/sedYveZoT: Read-only file system

/docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only file system

/docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system

Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx697932173 -Xms697932173 -XX:MaxDirectMemorySize=300647712 -XX:MaxMetaspaceSize=268435456 org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=166429984b -D taskmanager.memory.network.min=166429984b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=665719939b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=563714445b -D taskmanager.memory.task.off-heap.size=0b --configDir /opt/flink/conf -Djobmanager.memory.jvm-overhead.min='429496736b' -Dpipeline.classpaths='file:usrlib/quickstart-0.1.jar' -Dtaskmanager.resource-id='flink-demo-taskmanager-1-1' -Djobmanager.memory.off-heap.size='134217728b' -Dexecution.target='embedded' -Dweb.tmpdir='/tmp/flink-web-d7691661-fac5-494e-8154-896b4fe30692' -Dpipeline.jars='file:/opt/flink/usrlib/quickstart-0.1.jar' -Djobmanager.memory.jvm-metaspace.size='268435456b' -Djobmanager.memory.heap.size='3462817376b' -Djobmanager.memory.jvm-overhead.max='429496736b'

ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to method java.nio.DirectByteBuffer.cleaner()

WARNING: Please consider reporting this to the maintainers of org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil

WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

WARNING: All illegal access operations will be denied in a future release

Apr 29, 2021 12:58:34 AM oracle.simplefan.impl.FanManager configure

SEVERE: attempt to configure ONS in FanManager failed with oracle.ons.NoServersAvailable: Subscription time out

 

 

-------- The logs stops here, flink applications logs doesn’t get printed here anymore---------

 

 

Best,

Fuyao

 

 

From: Fuyao Li <[hidden email]>
Date: Friday, April 30, 2021 at 16:50
To: user <[hidden email]>, Yang Wang <[hidden email]>
Subject: [External] : StopWithSavepoint() method doesn't work in Java based flink native k8s operator

Hello Community, Yang,

 

I am trying to extend the flink native Kubernetes operator by adding some new features based on the repo [1]. I wrote a method to release the image update functionality. [2] I added the

triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig);

 

under the existing method.

triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig);

 

 

I wrote a function to accommodate the image change behavior.[2]

 

Solution1:

I want to use stopWithSavepoint() method to complete the task. However, I found it will get stuck and never get completed. Even if I use get() for the completeableFuture. It will always timeout and throw exceptions. See solution 1 logs [3]

 

Solution2:

I tried to trigger a savepoint, then delete the deployment in the code and then create a new application with new image. This seems to work fine. Log link: [4]

 

My questions:

  1. Why solution 1 will get stuck? triggerSavepoint() CompleteableFuture could work here… Why stopWithSavepoint() will always get stuck or timeout? Very confused.
  2. For Fabric8io library, I am still new to it, did I do anything wrong in the implementation, maybe I should update the jobStatus? Please give me some suggestions.
  3. For work around solution 2, is there any bad influence I didn’t notice?

 

 

[1] https://github.com/wangyang0918/flink-native-k8s-operator

[2] https://pastebin.ubuntu.com/p/tQShjmdcJt/

[3] https://pastebin.ubuntu.com/p/YHSPpK4W4Z/

[4] https://pastebin.ubuntu.com/p/3VG7TtXXfh/

 

Best,

Fuyao