What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Tony Wei
Hi everyone,

I read the Flink 1.8 release notes about state [1], and it said

Continuous incremental cleanup of old Keyed State with TTL
We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510). This feature allowed to clean up and make inaccessible keyed state entries when accessing them. In addition state would now also being cleaned up when writing a savepoint/checkpoint.
Flink 1.8 introduces continous cleanup of old entries for both the RocksDB state backend (FLINK-10471) and the heap state backend (FLINK-10473). This means that old entries (according to the ttl setting) are continously being cleanup up.

I'm not familiar with TTL's implementation in Flink 1.6 and what new features introduced in Flink
1.8. I don't understand what difference between these two release version after reading the
release notes. Did they change the outcome of TTL feature, or provide new TTL features, or just
change the behavior of executing TTL mechanism.

Could you give me more references to learn about it? A simple example to illustrate it is more
appreciated. Thank you.

Best,
Tony Wei

Reply | Threaded
Open this post in threaded view
|

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Konstantin Knauf-2
Hi Tony,

before Flink 1.8 expired state is only cleaned up, when you try to access it after expiration, i.e. when user code tries to access the expired state, the state value is cleaned and "null" is returned. There was also already the option to clean up expired state during full snapshots (https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state is cleaned up continuously in the background regardless of checkpointing or any attempt to access it after expiration.

As a reference the linked JIRA tickets should be a good starting point.

Hope this helps.

Konstantin




On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <[hidden email]> wrote:
Hi everyone,

I read the Flink 1.8 release notes about state [1], and it said

Continuous incremental cleanup of old Keyed State with TTL
We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510). This feature allowed to clean up and make inaccessible keyed state entries when accessing them. In addition state would now also being cleaned up when writing a savepoint/checkpoint.
Flink 1.8 introduces continous cleanup of old entries for both the RocksDB state backend (FLINK-10471) and the heap state backend (FLINK-10473). This means that old entries (according to the ttl setting) are continously being cleanup up.

I'm not familiar with TTL's implementation in Flink 1.6 and what new features introduced in Flink
1.8. I don't understand what difference between these two release version after reading the
release notes. Did they change the outcome of TTL feature, or provide new TTL features, or just
change the behavior of executing TTL mechanism.

Could you give me more references to learn about it? A simple example to illustrate it is more
appreciated. Thank you.

Best,
Tony Wei



--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   
Reply | Threaded
Open this post in threaded view
|

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Tony Wei
Hi Konstantin,

That is really helpful. Thanks.

Another follow-up question: The document said "Cleanup in full snapshot" is not applicable for
the incremental checkpointing in the RocksDB state backend. However, when user manually
trigger a savepoint and restart job from it, the expired states should be clean up as well based
on Flink 1.6's implementation. Am I right?

Best,
Tony Wei

Konstantin Knauf <[hidden email]> 於 2019年3月9日 週六 上午7:00寫道:
Hi Tony,

before Flink 1.8 expired state is only cleaned up, when you try to access it after expiration, i.e. when user code tries to access the expired state, the state value is cleaned and "null" is returned. There was also already the option to clean up expired state during full snapshots (https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state is cleaned up continuously in the background regardless of checkpointing or any attempt to access it after expiration.

As a reference the linked JIRA tickets should be a good starting point.

Hope this helps.

Konstantin




On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <[hidden email]> wrote:
Hi everyone,

I read the Flink 1.8 release notes about state [1], and it said

Continuous incremental cleanup of old Keyed State with TTL
We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510). This feature allowed to clean up and make inaccessible keyed state entries when accessing them. In addition state would now also being cleaned up when writing a savepoint/checkpoint.
Flink 1.8 introduces continous cleanup of old entries for both the RocksDB state backend (FLINK-10471) and the heap state backend (FLINK-10473). This means that old entries (according to the ttl setting) are continously being cleanup up.

I'm not familiar with TTL's implementation in Flink 1.6 and what new features introduced in Flink
1.8. I don't understand what difference between these two release version after reading the
release notes. Did they change the outcome of TTL feature, or provide new TTL features, or just
change the behavior of executing TTL mechanism.

Could you give me more references to learn about it? A simple example to illustrate it is more
appreciated. Thank you.

Best,
Tony Wei



--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   
Reply | Threaded
Open this post in threaded view
|

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Konstantin Knauf-2
Hi Tony,

yes, when taking a savepoint  the same strategy as the during a non-incremental checkpoint is used.

Best,

Konstantin

On Mon, Mar 11, 2019 at 2:29 AM Tony Wei <[hidden email]> wrote:
Hi Konstantin,

That is really helpful. Thanks.

Another follow-up question: The document said "Cleanup in full snapshot" is not applicable for
the incremental checkpointing in the RocksDB state backend. However, when user manually
trigger a savepoint and restart job from it, the expired states should be clean up as well based
on Flink 1.6's implementation. Am I right?

Best,
Tony Wei

Konstantin Knauf <[hidden email]> 於 2019年3月9日 週六 上午7:00寫道:
Hi Tony,

before Flink 1.8 expired state is only cleaned up, when you try to access it after expiration, i.e. when user code tries to access the expired state, the state value is cleaned and "null" is returned. There was also already the option to clean up expired state during full snapshots (https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state is cleaned up continuously in the background regardless of checkpointing or any attempt to access it after expiration.

As a reference the linked JIRA tickets should be a good starting point.

Hope this helps.

Konstantin




On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <[hidden email]> wrote:
Hi everyone,

I read the Flink 1.8 release notes about state [1], and it said

Continuous incremental cleanup of old Keyed State with TTL
We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510). This feature allowed to clean up and make inaccessible keyed state entries when accessing them. In addition state would now also being cleaned up when writing a savepoint/checkpoint.
Flink 1.8 introduces continous cleanup of old entries for both the RocksDB state backend (FLINK-10471) and the heap state backend (FLINK-10473). This means that old entries (according to the ttl setting) are continously being cleanup up.

I'm not familiar with TTL's implementation in Flink 1.6 and what new features introduced in Flink
1.8. I don't understand what difference between these two release version after reading the
release notes. Did they change the outcome of TTL feature, or provide new TTL features, or just
change the behavior of executing TTL mechanism.

Could you give me more references to learn about it? A simple example to illustrate it is more
appreciated. Thank you.

Best,
Tony Wei



--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   


--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen