Hi all,
I’m working on using Flink to do a variety of streaming jobs that will be processing very high-volume streams. I want to be able to update a job’s software with an absolute minimum impact on the processing of the data. What I don’t understand the best
way to update the software running the job. From what I gather, the way it works today is that I would have to shut down the first job, ensuring that it properly checkpoints, and then start up a new job. My concern is that this may take a relatively long time
and cause problems with SLAs I may have with my users.
Does Flink provide more nuanced ways of upgrading a job’s software?
Are there folks out there that are working with this sort of problem, either within Flink or around it?
Thank you for any help, thoughts, etc. you may have.
-Bruce
|
Hi Bruce, you're right, taking down the job and restarting (from a savepoint) with the updated software is the only way of doing it. I'm not aware of any work being done in this area right now but it is an important topic that we certainly have to tackle in the not-so-far future. Cheer, Aljoscha On Wed, 4 May 2016 at 19:52 Hanson, Bruce <[hidden email]> wrote:
|
In reply to this post by Hanson, Bruce
Hi,
in our more-or-less development environment we're doing sth like that in our main method: val processName = name_of_our_stream val configuration = GlobalConfiguration.getConfiguration val system = JobClient.startJobClientActorSystem(configuration) val timeout = FiniteDuration(10, TimeUnit.SECONDS) val gateway = LeaderRetrievalUtils.retrieveLeaderGateway( LeaderRetrievalUtils.createLeaderRetrievalService(configuration), system, timeout) implicit val executor = system.dispatcher val cancelResult = gateway.ask(JobManagerMessages.getRequestRunningJobsStatus, timeout).mapTo[RunningJobsStatus].flatMap { case RunningJobsStatus(runningJobs) => runningJobs.toList.find(_.getJobName == processName).map(job => { gateway.ask(JobManagerMessages.CancelJob(job.getJobId), FiniteDuration(1, TimeUnit.MINUTES)) }).getOrElse(Future.successful(())) } Await.result(cancelResult, FiniteDuration(1, TimeUnit.MINUTES)) system.shutdown() - this basically searches running jobs by name and cancels running one. Doing sth similar you can trigger savepoint, but unfortunatelly I don't see easy way of telling ExecutionEnvironment you want to use it. Probably it can be done by some clever hack :) br, maciek On 04/05/2016 19:52, Hanson, Bruce
wrote:
|
Free forum by Nabble | Edit this page |