Question about checkpoints and savepoints

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about checkpoints and savepoints

Robert Cullen
When I run a job on my Kubernetes session cluster only the checkpoint directories are created but not the savepoints. (Filesystem configured to S3 Minio)  Any ideas?

--
Robert Cullen
240-475-4490
Reply | Threaded
Open this post in threaded view
|

Re: Question about checkpoints and savepoints

rmetzger0
Hi,

has the "state.savepoints.dir" configuration key the same value as "state.checkpoints.dir"?
If not, can you post your configuration keys, and the invocation how you trigger a savepoint?
Have you checked the logs? Maybe there's an error message?

On Thu, Mar 25, 2021 at 7:17 PM Robert Cullen <[hidden email]> wrote:
When I run a job on my Kubernetes session cluster only the checkpoint directories are created but not the savepoints. (Filesystem configured to S3 Minio)  Any ideas?

--
Robert Cullen
240-475-4490
Reply | Threaded
Open this post in threaded view
|

Re: Question about checkpoints and savepoints

Robert Cullen

Here’s a snippet from the logs, there are no errors in the logs

2021-03-23 13:11:52,247 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
2021-03-23 13:11:52,249 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Preconfiguration: 
2021-03-23 13:11:52,249 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 

JM_RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx2097152000 -Xms2097152000 -XX:MaxMetaspaceSize=268435456
logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 4
INFO  [] - Loading configuration property: blob.server.port, 6124
INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO  [] - Loading configuration property: taskmanager.rpc.port, 6122
INFO  [] - Loading configuration property: queryable-state.proxy.ports, 6125
INFO  [] - Loading configuration property: jobmanager.memory.heap.size, 2000m
INFO  [] - Loading configuration property: taskmanager.memory.task.heap.size, 2000m
INFO  [] - Loading configuration property: taskmanager.memory.managed.size, 3000m
INFO  [] - Loading configuration property: parallelism.default, 2
INFO  [] - Loading configuration property: state.backend, filesystem
INFO  [] - Loading configuration property: state.checkpoints.dir, s3://flink/checkpoints
INFO  [] - Loading configuration property: state.savepoints.dir, s3://flink/savepoints
INFO  [] - Loading configuration property: s3.endpoint, http://cmdaa-minio:9000
INFO  [] - Loading configuration property: s3.path-style-access, true
INFO  [] - Loading configuration property: s3.path.style.access, true
INFO  [] - Loading configuration property: s3.access-key, cmdaa123
INFO  [] - Loading configuration property: s3.secret-key, ******
INFO  [] - Final Master Memory configuration:
INFO  [] -   Total Process Memory: 2.587gb (2777561320 bytes)
INFO  [] -     Total Flink Memory: 2.078gb (2231369728 bytes)
INFO  [] -       JVM Heap:         1.953gb (2097152000 bytes)
INFO  [] -       Off-heap:         128.000mb (134217728 bytes)
INFO  [] -     JVM Metaspace:      256.000mb (268435456 bytes)
INFO  [] -     JVM Overhead:       264.889mb (277756136 bytes)

On Fri, Mar 26, 2021 at 4:03 AM Robert Metzger <[hidden email]> wrote:
Hi,

has the "state.savepoints.dir" configuration key the same value as "state.checkpoints.dir"?
If not, can you post your configuration keys, and the invocation how you trigger a savepoint?
Have you checked the logs? Maybe there's an error message?

On Thu, Mar 25, 2021 at 7:17 PM Robert Cullen <[hidden email]> wrote:
When I run a job on my Kubernetes session cluster only the checkpoint directories are created but not the savepoints. (Filesystem configured to S3 Minio)  Any ideas?

--
Robert Cullen
240-475-4490


--
Robert Cullen
240-475-4490
Reply | Threaded
Open this post in threaded view
|

Re: Question about checkpoints and savepoints

rmetzger0
Mh, did you also check the TaskManger logs?
I'm not aware of any known or issues in the past in that direction, the codepaths for checkpoint / savepoint are fairly similar when it comes to storing the data.

You could also try to run Flink on DEBUG log level, maybe that reveals something?!


On Fri, Mar 26, 2021 at 1:37 PM Robert Cullen <[hidden email]> wrote:

Here’s a snippet from the logs, there are no errors in the logs

2021-03-23 13:11:52,247 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
2021-03-23 13:11:52,249 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Preconfiguration: 
2021-03-23 13:11:52,249 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 

JM_RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx2097152000 -Xms2097152000 -XX:MaxMetaspaceSize=268435456
logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 4
INFO  [] - Loading configuration property: blob.server.port, 6124
INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO  [] - Loading configuration property: taskmanager.rpc.port, 6122
INFO  [] - Loading configuration property: queryable-state.proxy.ports, 6125
INFO  [] - Loading configuration property: jobmanager.memory.heap.size, 2000m
INFO  [] - Loading configuration property: taskmanager.memory.task.heap.size, 2000m
INFO  [] - Loading configuration property: taskmanager.memory.managed.size, 3000m
INFO  [] - Loading configuration property: parallelism.default, 2
INFO  [] - Loading configuration property: state.backend, filesystem
INFO  [] - Loading configuration property: state.checkpoints.dir, s3://flink/checkpoints
INFO  [] - Loading configuration property: state.savepoints.dir, s3://flink/savepoints
INFO  [] - Loading configuration property: s3.endpoint, http://cmdaa-minio:9000
INFO  [] - Loading configuration property: s3.path-style-access, true
INFO  [] - Loading configuration property: s3.path.style.access, true
INFO  [] - Loading configuration property: s3.access-key, cmdaa123
INFO  [] - Loading configuration property: s3.secret-key, ******
INFO  [] - Final Master Memory configuration:
INFO  [] -   Total Process Memory: 2.587gb (2777561320 bytes)
INFO  [] -     Total Flink Memory: 2.078gb (2231369728 bytes)
INFO  [] -       JVM Heap:         1.953gb (2097152000 bytes)
INFO  [] -       Off-heap:         128.000mb (134217728 bytes)
INFO  [] -     JVM Metaspace:      256.000mb (268435456 bytes)
INFO  [] -     JVM Overhead:       264.889mb (277756136 bytes)

On Fri, Mar 26, 2021 at 4:03 AM Robert Metzger <[hidden email]> wrote:
Hi,

has the "state.savepoints.dir" configuration key the same value as "state.checkpoints.dir"?
If not, can you post your configuration keys, and the invocation how you trigger a savepoint?
Have you checked the logs? Maybe there's an error message?

On Thu, Mar 25, 2021 at 7:17 PM Robert Cullen <[hidden email]> wrote:
When I run a job on my Kubernetes session cluster only the checkpoint directories are created but not the savepoints. (Filesystem configured to S3 Minio)  Any ideas?

--
Robert Cullen
240-475-4490


--
Robert Cullen
240-475-4490