Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Folani
I'm deploying a standalone Flink cluster on top of Kubernetes and using MinIO
as a S3 backend. I mainly follow the instructions in flink's website.
I use the following command to run my job in Flink:  $flink run -d -m
<IP>:<port>  -j  job.jar

I also have added to flink-configmap.yaml the followings:


    state.backend: filesystem
    state.checkpoints.dir: s3://state/checkpoints
    state.savepoints.dir: s3://state/savepoints
    s3.path-style-access: true
    s3.endpoint: http://minio-service:9000
    s3.access-key: *******
    s3.secret-key: *******

It seems that everything is working well. The job is submitted correctly,
the checkpoints are written in minio, but when I try to cancel the job or
stop it with savepoints I get the following exception:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"5ae191ca2b239ec7771e4c7a9a336537".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
        at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
        ... 6 more

This is my command to stop with savepoints:  $flink stop -p  <JobID>
And my Flink version is flink-1.11.2-bin-scala_2.11.

What could be the reason of the exception? Any suggestion?







--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Piotr Nowojski-4
Hi,

It's hard for me to guess what could be the problem. There was the same error reported a couple of months ago [1], but there is frankly no extra information there.

Can we start from looking at the full TaskManager and JobManager logs? Could you share them with us?

Best,
Piotrek



pt., 11 gru 2020 o 14:04 Folani <[hidden email]> napisał(a):
I'm deploying a standalone Flink cluster on top of Kubernetes and using MinIO
as a S3 backend. I mainly follow the instructions in flink's website.
I use the following command to run my job in Flink:  $flink run -d -m
<IP>:<port>  -j  job.jar

I also have added to flink-configmap.yaml the followings:


    state.backend: filesystem
    state.checkpoints.dir: s3://state/checkpoints
    state.savepoints.dir: s3://state/savepoints
    s3.path-style-access: true
    s3.endpoint: http://minio-service:9000
    s3.access-key: *******
    s3.secret-key: *******

It seems that everything is working well. The job is submitted correctly,
the checkpoints are written in minio, but when I try to cancel the job or
stop it with savepoints I get the following exception:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"5ae191ca2b239ec7771e4c7a9a336537".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
        at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
        ... 6 more

This is my command to stop with savepoints:  $flink stop -p  <JobID>
And my Flink version is flink-1.11.2-bin-scala_2.11.

What could be the reason of the exception? Any suggestion?







--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Folani
Hi Piotrek,

Sorry for late response.
I have another problem with setting logs. I think the logging issue comes
from using Flink on my host machine and running a job on a jobmanager in
K8s. I'm managing the issue. But, this is what I got in /log folder of my
host machine:



2020-12-14 15:04:51,329 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  Starting Command Line Client (Version: 1.12.0, Scala: 2.11,
Rev:fc00492, Date:2020-12-02T08:49:16+01:00)
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  OS current user: folani
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.212-b04
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  Maximum heap size: 3538 MiBytes
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  JAVA_HOME: (not set)
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  No Hadoop Dependency available
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  JVM Options:
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -    
-Dlog.file=/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/log/flink-folani-client-hralaptop.log
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -    
-Dlog4j.configuration=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -    
-Dlog4j.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -    
-Dlogback.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/logback.xml
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  Program Arguments:
2020-12-14 15:04:51,333 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -     stop
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -     -p
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -     cc4a9a04e164fff8628b3ac59e5fbe80
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -  Classpath:
/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-csv-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-json-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-shaded-zookeeper-3.4.14.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table-blink_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-1.2-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-core-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-slf4j-impl-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-dist_2.11-1.12.0.jar:::
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                    
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,336 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.address, localhost
2020-12-14 15:04:51,336 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.port, 6123
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: parallelism.default, 1
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.backend, filesystem
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.checkpoints.dir, s3://state/checkpoints
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.savepoints.dir, s3://state/savepoints
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.path-style-access, true
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.endpoint, http://172.17.0.3:30090
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.access-key, minio
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.secret-key, ******
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.execution.failover-strategy, region
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.class,
org.apache.flink.metrics.prometheus.PrometheusReporter
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.host, localhost
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.port, 9250-9260
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.network.detailed-metrics, true
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.granularity, "subtask"
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.interval, 1000
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: web.backpressure.refresh-interval, 1000
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource, true
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource-probing-interval, 1000
2020-12-14 15:04:51,358 INFO  org.apache.flink.client.cli.CliFrontend                    
[] - Loading FallbackYarnSessionCli
2020-12-14 15:04:51,392 INFO  org.apache.flink.core.fs.FileSystem                        
[] - Hadoop is not in the classpath/dependencies. The extended set of
supported File Systems via Hadoop is not available.
2020-12-14 15:04:51,449 INFO
org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot
create Hadoop Security Module because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,454 INFO
org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file
will be created as /tmp/jaas-1159138828547882777.conf.
2020-12-14 15:04:51,458 INFO
org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] -
Cannot install HadoopSecurityContext because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,459 INFO  org.apache.flink.client.cli.CliFrontend                    
[] - Running 'stop-with-savepoint' command.
2020-12-14 15:04:51,466 INFO  org.apache.flink.client.cli.CliFrontend                    
[] - Suspending job "cc4a9a04e164fff8628b3ac59e5fbe80" with a savepoint.
2020-12-14 15:04:51,470 INFO
org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] -
Could not load factory due to missing dependencies.
2020-12-14 15:04:51,485 INFO  org.apache.flink.configuration.Configuration                
[] - Config uses fallback configuration key 'jobmanager.rpc.address' instead
of key 'rest.address'
2020-12-14 15:05:51,973 WARN
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel [] -
Force-closing a channel whose registration task was not accepted by an event
loop: [id: 0xe1d20630]
java.util.concurrent.RejectedExecutionException: event executor terminated
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:471)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:333)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.rejectedExecution
[] - Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:337)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR org.apache.flink.client.cli.CliFrontend                    
[] - Error while running the command.
org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"cc4a9a04e164fff8628b3ac59e5fbe80".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_212]
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 6 more


Thank you in advance.
Folani



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

rmetzger0
Hi,
the logs from the client are not helpful for debugging this particular issue.

With kubectl get pods, you can get the TaskManger pod names, with kubectl logs <podID> you can get the logs.
The JobManager log would also be nice to have.

On Mon, Dec 14, 2020 at 3:29 PM Folani <[hidden email]> wrote:
Hi Piotrek,

Sorry for late response.
I have another problem with setting logs. I think the logging issue comes
from using Flink on my host machine and running a job on a jobmanager in
K8s. I'm managing the issue. But, this is what I got in /log folder of my
host machine:



2020-12-14 15:04:51,329 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Starting Command Line Client (Version: 1.12.0, Scala: 2.11,
Rev:fc00492, Date:2020-12-02T08:49:16+01:00)
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  OS current user: folani
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.212-b04
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Maximum heap size: 3538 MiBytes
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JAVA_HOME: (not set)
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  No Hadoop Dependency available
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JVM Options:
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -   
-Dlog.file=/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/log/flink-folani-client-hralaptop.log
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -   
-Dlog4j.configuration=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -   
-Dlog4j.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -   
-Dlogback.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/logback.xml
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Program Arguments:
2020-12-14 15:04:51,333 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     stop
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     -p
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     cc4a9a04e164fff8628b3ac59e5fbe80
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Classpath:
/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-csv-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-json-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-shaded-zookeeper-3.4.14.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table-blink_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-1.2-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-core-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-slf4j-impl-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-dist_2.11-1.12.0.jar:::
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,336 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.address, localhost
2020-12-14 15:04:51,336 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.port, 6123
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: parallelism.default, 1
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.backend, filesystem
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.checkpoints.dir, s3://state/checkpoints
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.savepoints.dir, s3://state/savepoints
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.path-style-access, true
2020-12-14 15:04:51,337 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.endpoint, http://172.17.0.3:30090
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.access-key, minio
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.secret-key, ******
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.execution.failover-strategy, region
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.class,
org.apache.flink.metrics.prometheus.PrometheusReporter
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.host, localhost
2020-12-14 15:04:51,338 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.port, 9250-9260
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.network.detailed-metrics, true
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.granularity, "subtask"
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.interval, 1000
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: web.backpressure.refresh-interval, 1000
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource, true
2020-12-14 15:04:51,339 INFO
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource-probing-interval, 1000
2020-12-14 15:04:51,358 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Loading FallbackYarnSessionCli
2020-12-14 15:04:51,392 INFO  org.apache.flink.core.fs.FileSystem                         
[] - Hadoop is not in the classpath/dependencies. The extended set of
supported File Systems via Hadoop is not available.
2020-12-14 15:04:51,449 INFO
org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot
create Hadoop Security Module because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,454 INFO
org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file
will be created as /tmp/jaas-1159138828547882777.conf.
2020-12-14 15:04:51,458 INFO
org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] -
Cannot install HadoopSecurityContext because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,459 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Running 'stop-with-savepoint' command.
2020-12-14 15:04:51,466 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Suspending job "cc4a9a04e164fff8628b3ac59e5fbe80" with a savepoint.
2020-12-14 15:04:51,470 INFO
org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] -
Could not load factory due to missing dependencies.
2020-12-14 15:04:51,485 INFO  org.apache.flink.configuration.Configuration               
[] - Config uses fallback configuration key 'jobmanager.rpc.address' instead
of key 'rest.address'
2020-12-14 15:05:51,973 WARN
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel [] -
Force-closing a channel whose registration task was not accepted by an event
loop: [id: 0xe1d20630]
java.util.concurrent.RejectedExecutionException: event executor terminated
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:471)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:333)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.rejectedExecution
[] - Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:337)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
        at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR org.apache.flink.client.cli.CliFrontend                     
[] - Error while running the command.
org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"cc4a9a04e164fff8628b3ac59e5fbe80".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_212]
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 6 more


Thank you in advance.
Folani



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Folani
Hi,

I attached the log files for Jobmanager and Taskmanager:

   jobmanager_log.asc
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/jobmanager_log.asc>  
   6122-f8b99d_log.6122-f8b99d_log
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/6122-f8b99d_log.6122-f8b99d_log>  


I do the following steps:

1- $ flink run -d -m <JM_rest_IP>:8081 -j SimpleJob.jar

2- $ flink stop -m <JM_rest_IP>:8081 -p <JobID>

Then I get the following exception:


org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"69daa5e180ea68bf1cceec70d865a08b".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
        at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
        ... 6 more






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

rmetzger0
I guess you are seeing a different error now, because you are submitting the job, and stopping it right away.

Can you produce new logs, where you wait until at least one Checkpoint successfully completed before you stop?
From the exception it seems that the job has not successfully been initialized. I  would be surprised if  regular checkpoints are possible at that point already.

On Wed, Dec 16, 2020 at 10:59 AM Folani <[hidden email]> wrote:
Hi,

I attached the log files for Jobmanager and Taskmanager:

   jobmanager_log.asc
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/jobmanager_log.asc
   6122-f8b99d_log.6122-f8b99d_log
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/6122-f8b99d_log.6122-f8b99d_log


I do the following steps:

1- $ flink run -d -m <JM_rest_IP>:8081 -j SimpleJob.jar

2- $ flink stop -m <JM_rest_IP>:8081 -p <JobID>

Then I get the following exception:


org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"69daa5e180ea68bf1cceec70d865a08b".
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
        at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
        at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
        at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
        at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
        ... 6 more






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/