MapState - TypeSerializer

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

MapState - TypeSerializer

Alexey Trenikhun
Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey


Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Andrey Zagrebin
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey

Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Alexey Trenikhun
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey

 

From: Andrey Zagrebin <[hidden email]>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey

Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Congxian Qiu
Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <[hidden email]> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey

 

From: Andrey Zagrebin <[hidden email]>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey

Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Alexey Trenikhun
But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey

 

From: Congxian Qiu <[hidden email]>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <[hidden email]> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey

 

From: Andrey Zagrebin <[hidden email]>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey

Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Yun Tang
Hi Alexey

First of all, 'TypeSerializerConfigSnapshot' has actually been deprecated from Flink-1.7 [1], current serializer's snapshot class should be 'TypeSerializerSnapshot'.

And answer your question, only TypeSerializer2's snapshot would be stored during checkpoint. For off-heap state backend (e.g. RocksDBStateBackend), state migration happened before any actual state read/write [2], all data would be stored in RocksDB using latest serializer after loading from savepoint.

These three pictures below, borrowed from Gordon's talk at Flink Forward China 2018 [3], should give a vivid interpretation.





Best
Yun Tang

From: Alexey Trenikhun <[hidden email]>
Sent: Friday, February 8, 2019 12:35
To: Congxian Qiu
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey

 

From: Congxian Qiu <[hidden email]>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <[hidden email]> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey

 

From: Andrey Zagrebin <[hidden email]>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey

Reply | Threaded
Open this post in threaded view
|

Re: MapState - TypeSerializer

Alexey Trenikhun
It seems changed since "Flink Forward Berlin 2018" (https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-tzuli-gordon-tai-upgrading-apache-flink-applications-state-of-the-union), slide 25, where I see part of entries V1 and part V2. Thank you for up-to-date links.

Thanks,
Alexey

From: Yun Tang <[hidden email]>
Sent: Thursday, February 7, 2019 10:32 PM
To: Alexey Trenikhun; Congxian Qiu
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey

First of all, 'TypeSerializerConfigSnapshot' has actually been deprecated from Flink-1.7 [1], current serializer's snapshot class should be 'TypeSerializerSnapshot'.

And answer your question, only TypeSerializer2's snapshot would be stored during checkpoint. For off-heap state backend (e.g. RocksDBStateBackend), state migration happened before any actual state read/write [2], all data would be stored in RocksDB using latest serializer after loading from savepoint.

These three pictures below, borrowed from Gordon's talk at Flink Forward China 2018 [3], should give a vivid interpretation.





Best
Yun Tang

From: Alexey Trenikhun <[hidden email]>
Sent: Friday, February 8, 2019 12:35
To: Congxian Qiu
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey

 

From: Congxian Qiu <[hidden email]>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <[hidden email]> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey

 

From: Andrey Zagrebin <[hidden email]>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: [hidden email]
Subject: Re: MapState - TypeSerializer
 
Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <[hidden email]> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey