Add operator ids for an already running job

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Add operator ids for an already running job

Paul Lam
Hi,

I have a legacy stateful streaming job running in production, whose operator ids are not set and were auto generated by Flink. And now I’m planning to update the code and explicitly set operator ids for the job.

Is it possible to keep the states of the job, since the operator ids would change? Thanks a lot!

Best,
Paul Lam
Reply | Threaded
Open this post in threaded view
|

Re: Add operator ids for an already running job

vino yang
Hi Paul,

Referring to the Flink official documentation, it seems that this is not possible at this time.[1]

Someone has previously discussed the state of the savepoint in the dev mailing list. Perhaps this method can meet your needs after implementation. [2]

Thanks, vino.


Paul Lam <[hidden email]> 于2018年9月18日周二 下午7:11写道:
Hi,

I have a legacy stateful streaming job running in production, whose operator ids are not set and were auto generated by Flink. And now I’m planning to update the code and explicitly set operator ids for the job.

Is it possible to keep the states of the job, since the operator ids would change? Thanks a lot!

Best,
Paul Lam
Reply | Threaded
Open this post in threaded view
|

Re: Add operator ids for an already running job

Fabian Hueske-2
The auto-generated ids are included in the savepoint data. So, it should be possible to them from the savepoint.
However, AFAIK, there is no tool to do that. You'd need to manually dig into the serialized data.

Cheers, Fabian

2018-09-18 13:30 GMT+02:00 vino yang <[hidden email]>:
Hi Paul,

Referring to the Flink official documentation, it seems that this is not possible at this time.[1]

Someone has previously discussed the state of the savepoint in the dev mailing list. Perhaps this method can meet your needs after implementation. [2]

Thanks, vino.


Paul Lam <[hidden email]> 于2018年9月18日周二 下午7:11写道:
Hi,

I have a legacy stateful streaming job running in production, whose operator ids are not set and were auto generated by Flink. And now I’m planning to update the code and explicitly set operator ids for the job.

Is it possible to keep the states of the job, since the operator ids would change? Thanks a lot!

Best,
Paul Lam

Reply | Threaded
Open this post in threaded view
|

Re: Add operator ids for an already running job

Paul Lam
In reply to this post by vino yang
Hi vino,

Thanks for the reply!

I’m looking forward to Bravo too. But for now, I have an idea that I can set the stateful operators' ids to the same as the auto generated ones, so the savepoint would be still usable. May I know your opinion on this?

Best,
Paul Lam


在 2018年9月18日,19:30,vino yang <[hidden email]> 写道:

Flink

Reply | Threaded
Open this post in threaded view
|

Re: Add operator ids for an already running job

Paul Lam
In reply to this post by Fabian Hueske-2
Hi Fabian,

Thanks for your reply!

It seems like there is a word missing. Did you mean it’s possible to extract the operator ids from the savepoint? Or modify the ids in the savepoint?

Best,
Paul Lam

在 2018年9月18日,20:09,Fabian Hueske <[hidden email]> 写道:

The auto-generated ids are included in the savepoint data. So, it should be possible to them from the savepoint.
However, AFAIK, there is no tool to do that. You'd need to manually dig into the serialized data.

Cheers, Fabian

2018-09-18 13:30 GMT+02:00 vino yang <[hidden email]>:
Hi Paul,

Referring to the Flink official documentation, it seems that this is not possible at this time.[1]

Someone has previously discussed the state of the savepoint in the dev mailing list. Perhaps this method can meet your needs after implementation. [2]

Thanks, vino.


Paul Lam <[hidden email]> 于2018年9月18日周二 下午7:11写道:
Hi,

I have a legacy stateful streaming job running in production, whose operator ids are not set and were auto generated by Flink. And now I’m planning to update the code and explicitly set operator ids for the job.

Is it possible to keep the states of the job, since the operator ids would change? Thanks a lot!

Best,
Paul Lam


Reply | Threaded
Open this post in threaded view
|

Re: Add operator ids for an already running job

Fabian Hueske-2
It is possible to extract the operator ids from the savepoint.
You should also be able to modify them if you know the serialization format.

But all of this requires messing around with binary data (including digging into Flink's serialization and savepoint code), so I'd only do it if it is really necessary.

2018-09-18 14:21 GMT+02:00 Paul Lam <[hidden email]>:
Hi Fabian,

Thanks for your reply!

It seems like there is a word missing. Did you mean it’s possible to extract the operator ids from the savepoint? Or modify the ids in the savepoint?

Best,
Paul Lam


在 2018年9月18日,20:09,Fabian Hueske <[hidden email]> 写道:

The auto-generated ids are included in the savepoint data. So, it should be possible to them from the savepoint.
However, AFAIK, there is no tool to do that. You'd need to manually dig into the serialized data.

Cheers, Fabian

2018-09-18 13:30 GMT+02:00 vino yang <[hidden email]>:
Hi Paul,

Referring to the Flink official documentation, it seems that this is not possible at this time.[1]

Someone has previously discussed the state of the savepoint in the dev mailing list. Perhaps this method can meet your needs after implementation. [2]

Thanks, vino.


Paul Lam <[hidden email]> 于2018年9月18日周二 下午7:11写道:
Hi,

I have a legacy stateful streaming job running in production, whose operator ids are not set and were auto generated by Flink. And now I’m planning to update the code and explicitly set operator ids for the job.

Is it possible to keep the states of the job, since the operator ids would change? Thanks a lot!

Best,
Paul Lam