StopWithSavepoint() method doesn't work in Java based flink native k8s operator

Posted by Fuyao Li-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/StopWithSavepoint-method-doesn-t-work-in-Java-based-flink-native-k8s-operator-tp43404.html

Hello Community, Yang,

 

I am trying to extend the flink native Kubernetes operator by adding some new features based on the repo [1]. I wrote a method to release the image update functionality. [2] I added the

triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig);

 

under the existing method.

triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig);

 

 

I wrote a function to accommodate the image change behavior.[2]

 

Solution1:

I want to use stopWithSavepoint() method to complete the task. However, I found it will get stuck and never get completed. Even if I use get() for the completeableFuture. It will always timeout and throw exceptions. See solution 1 logs [3]

 

Solution2:

I tried to trigger a savepoint, then delete the deployment in the code and then create a new application with new image. This seems to work fine. Log link: [4]

 

My questions:

  1. Why solution 1 will get stuck? triggerSavepoint() CompleteableFuture could work here… Why stopWithSavepoint() will always get stuck or timeout? Very confused.
  2. For Fabric8io library, I am still new to it, did I do anything wrong in the implementation, maybe I should update the jobStatus? Please give me some suggestions.
  3. For work around solution 2, is there any bad influence I didn’t notice?

 

 

[1] https://github.com/wangyang0918/flink-native-k8s-operator

[2] https://pastebin.ubuntu.com/p/tQShjmdcJt/

[3] https://pastebin.ubuntu.com/p/YHSPpK4W4Z/

[4] https://pastebin.ubuntu.com/p/3VG7TtXXfh/

 

Best,

Fuyao