rebalance of streaming job after taskManager restart

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

rebalance of streaming job after taskManager restart

Maciek Próchniak
Hi,

we have streaming job with paralelism 2 and two task managers. The job
is occupying one slot on each task manager. When I stop manager2 the job
is restarted and it runs on manager1 - occupying two of it's slots.
How can I trigger restart (or other similar process) that will cause the
job to be balanced among task managers?

thanks,
maciek
Reply | Threaded
Open this post in threaded view
|

Re: rebalance of streaming job after taskManager restart

Aljoscha Krettek
Hi,
I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.

See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
and
https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
for documentation about savepoints.

The steps to follow should be:
 
bin/flink savepoint <your job id>

this will print a savepoint path that you will need later.
 
bin/flink cancel <your job id>

bin/flink run -s <savepoint path> …

The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.

I hope that helps.

Cheers,
Aljoscha
> On 08 Mar 2016, at 15:48, Maciek Próchniak <[hidden email]> wrote:
>
> Hi,
>
> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
>
> thanks,
> maciek

Reply | Threaded
Open this post in threaded view
|

Re: rebalance of streaming job after taskManager restart

Maciek Próchniak
Hi,

thanks for quick answer - yes, I does what I want to accomplish,
but I was hoping for some "easier" solution.
Are there any plans for "restart" button/command or sth similar? I mean,
the whole process of restarting is ready as I understand - as it's
triggered when task manager dies.

thanks,
maciek

On 08/03/2016 16:03, Aljoscha Krettek wrote:

> Hi,
> I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.
>
> See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
> and
> https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
> for documentation about savepoints.
>
> The steps to follow should be:
>  
> bin/flink savepoint <your job id>
>
> this will print a savepoint path that you will need later.
>  
> bin/flink cancel <your job id>
>
> bin/flink run -s <savepoint path> …
>
> The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.
>
> I hope that helps.
>
> Cheers,
> Aljoscha
>> On 08 Mar 2016, at 15:48, Maciek Próchniak <[hidden email]> wrote:
>>
>> Hi,
>>
>> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
>> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
>>
>> thanks,
>> maciek
>

Reply | Threaded
Open this post in threaded view
|

Re: rebalance of streaming job after taskManager restart

Aljoscha Krettek
Yes, there are plans to make this more streamlined but we are not there yet, unfortunately.

> On 08 Mar 2016, at 16:07, Maciek Próchniak <[hidden email]> wrote:
>
> Hi,
>
> thanks for quick answer - yes, I does what I want to accomplish,
> but I was hoping for some "easier" solution.
> Are there any plans for "restart" button/command or sth similar? I mean, the whole process of restarting is ready as I understand - as it's triggered when task manager dies.
>
> thanks,
> maciek
>
> On 08/03/2016 16:03, Aljoscha Krettek wrote:
>> Hi,
>> I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.
>>
>> See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
>> and
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
>> for documentation about savepoints.
>>
>> The steps to follow should be:
>>  bin/flink savepoint <your job id>
>>
>> this will print a savepoint path that you will need later.
>>  bin/flink cancel <your job id>
>>
>> bin/flink run -s <savepoint path> …
>>
>> The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.
>>
>> I hope that helps.
>>
>> Cheers,
>> Aljoscha
>>> On 08 Mar 2016, at 15:48, Maciek Próchniak <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
>>> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
>>>
>>> thanks,
>>> maciek
>>
>