Hi there,
We have recently migrated to flink 1.12 from 1.7, although the jobs are running fine, sometimes the task manager is getting killed (much frequently 2-3 times a day). Logs: INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. While checking more logs we see flink not able to discard old checkpoints org.apache.flink.runtime.checkpoint.CheckpointsCleaner [] - Could not discard completed checkpoint 173. We are not sure of what is the reason here, has anyone faced this before? Regards Sambaran |
Hi Sambaran, could you also share the cause why the checkpoints could not be discarded with us? With Flink 1.10, we introduced a stricter memory model for the TaskManagers. That could be a reason why you see more TaskManagers being killed by the underlying resource management system. You could maybe check whether your resource management system logs that some containers/pods are exceeding their memory limitations. If this is the case, then you should give your Flink processes a bit more memory [1]. Cheers, Till On Tue, Apr 27, 2021 at 6:48 PM Sambaran <[hidden email]> wrote:
|
Hi Till, Thank you for the response, we are currently running flink with an increased memory usage, so far the taskmanager is working fine, we will check if there is any further issue and will update you. Regards Sambaran On Wed, Apr 28, 2021 at 5:33 PM Till Rohrmann <[hidden email]> wrote:
|
Great, thanks for the update. On Wed, Apr 28, 2021 at 7:08 PM Sambaran <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |