Running Kubernetes on Flink with Savepoint

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Running Kubernetes on Flink with Savepoint

Matt Magsombol
We're currently using this template: https://github.com/docker-flink/examples/tree/master/helm/flink for running kubernetes flink for running a job specific cluster ( with a nit of specifying the class as the main runner for the cluster ).


How would I go about setting up adding savepoints, so that we can edit our currently existing running jobs to add pipes to the flink job without having to restart our state? Reasoning is that our state has a 1 day TTL and updating our code without state will have to restart this from scratch.

Through documentation, I see that I'd need to run some sort of command. This is not possible to be consistent if we're using the helm charts specified in the link.

I see this email thread talking about a certain problem with savepoints + kubernetes but doesn't quite specify how to set this up with helm: https://lists.apache.org/thread.html/4299518f4da2810aa88fe6b21f841880b619f3f8ac264084a318c034%40%3Cuser.flink.apache.org%3E


According to hasun@zendesk from that thread, they mention that "We always make a savepoint before we shutdown the job-cluster. So the savepoint is always the latest. When we fix a bug or change the job graph, it can resume well."

This is the exact use case that I'm looking to appease. Other than specifying configs, are there any other additional parameters that I'd need to add within helm to specify that it needs to take in the latest savepoint upon starting?
Reply | Threaded
Open this post in threaded view
|

Re: Running Kubernetes on Flink with Savepoint

rmetzger0
Hi Matt,

I don't think that the helm charts you are mentioning are actively maintained or recommended for production use.

If you want to create a savepoint in Flink, you'll need to trigger it via the JobManager's REST API (independent of how you deploy it). I guess you'll have to come up with some tooling that orchestrates triggering a savepoint before shutting down / upgrading the job.

Best,
Robert



On Wed, Jun 10, 2020 at 2:48 PM Matt Magsombol <[hidden email]> wrote:
We're currently using this template: https://github.com/docker-flink/examples/tree/master/helm/flink for running kubernetes flink for running a job specific cluster ( with a nit of specifying the class as the main runner for the cluster ).


How would I go about setting up adding savepoints, so that we can edit our currently existing running jobs to add pipes to the flink job without having to restart our state? Reasoning is that our state has a 1 day TTL and updating our code without state will have to restart this from scratch.

Through documentation, I see that I'd need to run some sort of command. This is not possible to be consistent if we're using the helm charts specified in the link.

I see this email thread talking about a certain problem with savepoints + kubernetes but doesn't quite specify how to set this up with helm: https://lists.apache.org/thread.html/4299518f4da2810aa88fe6b21f841880b619f3f8ac264084a318c034%40%3Cuser.flink.apache.org%3E


According to hasun@zendesk from that thread, they mention that "We always make a savepoint before we shutdown the job-cluster. So the savepoint is always the latest. When we fix a bug or change the job graph, it can resume well."

This is the exact use case that I'm looking to appease. Other than specifying configs, are there any other additional parameters that I'd need to add within helm to specify that it needs to take in the latest savepoint upon starting?
Reply | Threaded
Open this post in threaded view
|

Re: Running Kubernetes on Flink with Savepoint

Matt Magsombol
Yeah, our set up is a bit out dated ( since flink 1.7-ish ) but we're effectively just using helm templates...when upgrading to 1.10, I just ended up looking at diffs and change logs for changes...
Anyways, thanks, I was hoping that flink has a community supported way of doing this, but I think I know what to do internally

On 2020/06/15 15:11:32, Robert Metzger <[hidden email]> wrote:

> Hi Matt,
>
> sorry for the late reply. Why are you using the "flink-docker" helm example
> instead of
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html
>  or
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>  ?
> I don't think that the helm charts you are mentioning are actively
> maintained or recommended for production use.
>
> If you want to create a savepoint in Flink, you'll need to trigger it via
> the JobManager's REST API (independent of how you deploy it). I guess
> you'll have to come up with some tooling that orchestrates triggering a
> savepoint before shutting down / upgrading the job.
> See also:
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jobs-jobid-savepoints
>
> Best,
> Robert
>
>
>
> On Wed, Jun 10, 2020 at 2:48 PM Matt Magsombol <[hidden email]> wrote:
>
> > We're currently using this template:
> > https://github.com/docker-flink/examples/tree/master/helm/flink for
> > running kubernetes flink for running a job specific cluster ( with a nit of
> > specifying the class as the main runner for the cluster ).
> >
> >
> > How would I go about setting up adding savepoints, so that we can edit our
> > currently existing running jobs to add pipes to the flink job without
> > having to restart our state? Reasoning is that our state has a 1 day TTL
> > and updating our code without state will have to restart this from scratch.
> >
> > Through documentation, I see that I'd need to run some sort of command.
> > This is not possible to be consistent if we're using the helm charts
> > specified in the link.
> >
> > I see this email thread talking about a certain problem with savepoints +
> > kubernetes but doesn't quite specify how to set this up with helm:
> > https://lists.apache.org/thread.html/4299518f4da2810aa88fe6b21f841880b619f3f8ac264084a318c034%40%3Cuser.flink.apache.org%3E
> >
> >
> > According to hasun@zendesk from that thread, they mention that "We always
> > make a savepoint before we shutdown the job-cluster. So the savepoint is
> > always the latest. When we fix a bug or change the job graph, it can resume
> > well."
> >
> > This is the exact use case that I'm looking to appease. Other than
> > specifying configs, are there any other additional parameters that I'd need
> > to add within helm to specify that it needs to take in the latest savepoint
> > upon starting?
> >
>