(DEPRECATED) Apache Flink User Mailing List archive.

Frequent Full GC's in case of FSStateBackend

Classic

List

Threaded

7 messages Options

Vinay Patil

Frequent Full GC's in case of FSStateBackend

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,

Vinay Patil

Stefan Richter

Re: Frequent Full GC's in case of FSStateBackend

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,

Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

Vinay Patil

Re: Frequent Full GC's in case of FSStateBackend

Hi Stephan,

Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0 version and I have set the statebackend in flink-conf.yaml file to rocksdb, so by default does this do asynchronous checkpointing or I have to specify it at the job level ?

Regards,

Vinay Patil

On Fri, Feb 10, 2017 at 4:16 PM, Stefan Richter [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,
Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p11565.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

Stefan Richter

Re: Frequent Full GC's in case of FSStateBackend

Async snapshotting is the default.

Am 10.02.2017 um 14:03 schrieb vinay patil <[hidden email]>:

Hi Stephan,

Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0 version and I have set the statebackend in flink-conf.yaml file to rocksdb, so by default does this do asynchronous checkpointing or I have to specify it at the job level ?

Regards,
Vinay Patil

On Fri, Feb 10, 2017 at 4:16 PM, Stefan Richter [via Apache Flink User Mailing List archive.] <<a href="x-msg://3/user/SendEmail.jtp?type=node&node=11568&i=0" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote:

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,
Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p11565.html

To start a new topic under Apache Flink User Mailing List archive., email <a href="x-msg://3/user/SendEmail.jtp?type=node&node=11568&i=1" target="_top" rel="nofollow" link="external" class="">[hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

View this message in context: Re: Frequent Full GC's in case of FSStateBackend
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

saiprasad mishra

Re: Frequent Full GC's in case of FSStateBackend

Hi All

I am also seeing issues with FsStateBackend as it stalls coz of full gc. We have very large state,

Does this mean the below doc should not claim that FsStateBackend is encouraged for large state.

https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-fsstatebackend

Regards

Sai

On Fri, Feb 10, 2017 at 6:19 AM, Stefan Richter <[hidden email]> wrote:

Async snapshotting is the default.

Am 10.02.2017 um 14:03 schrieb vinay patil <[hidden email]>:

Hi Stephan,

Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0 version and I have set the statebackend in flink-conf.yaml file to rocksdb, so by default does this do asynchronous checkpointing or I have to specify it at the job level ?

Regards,
Vinay Patil

On Fri, Feb 10, 2017 at 4:16 PM, Stefan Richter [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,
Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p11565.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

View this message in context: Re: Frequent Full GC's in case of FSStateBackend
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Vinay Patil

Re: Frequent Full GC's in case of FSStateBackend

Hi Sai,

If you are sure that your state will not exceed the memory limit of nodes then you should consider FSStatebackend otherwise you should go for RocksDB

What is the configuration of your cluster ?

On Mar 9, 2017 7:31 AM, "saiprasad mishra [via Apache Flink User Mailing List archive.]" <[hidden email]> wrote:

Hi All

I am also seeing issues with FsStateBackend as it stalls coz of full gc. We have very large state,
Does this mean the below doc should not claim that FsStateBackend is encouraged for large state.

https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-fsstatebackend

Regards
Sai

On Fri, Feb 10, 2017 at 6:19 AM, Stefan Richter <[hidden email]> wrote:
Async snapshotting is the default.

Am 10.02.2017 um 14:03 schrieb vinay patil <[hidden email]>:

Hi Stephan,

Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0 version and I have set the statebackend in flink-conf.yaml file to rocksdb, so by default does this do asynchronous checkpointing or I have to specify it at the job level ?

Regards,
Vinay Patil

On Fri, Feb 10, 2017 at 4:16 PM, Stefan Richter [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,
Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p11565.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

View this message in context: Re: Frequent Full GC's in case of FSStateBackend
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p12126.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

saiprasad mishra

Re: Frequent Full GC's in case of FSStateBackend

Thanks Vinay for the quick reply.

Yes rocksdb version is working perfectly without any issues but it needs more planning on the hardware side for the servers running the job

As you said and observed FsStateBackend is not useful for large state which does not fit memory, we do have very large state

Regards

Sai

On Wed, Mar 8, 2017 at 6:21 PM, vinay patil <[hidden email]> wrote:

Hi Sai,

If you are sure that your state will not exceed the memory limit of nodes then you should consider FSStatebackend otherwise you should go for RocksDB

What is the configuration of your cluster ?

On Mar 9, 2017 7:31 AM, "saiprasad mishra [via Apache Flink User Mailing List archive.]" <[hidden email]> wrote:

Hi All

I am also seeing issues with FsStateBackend as it stalls coz of full gc. We have very large state,
Does this mean the below doc should not claim that FsStateBackend is encouraged for large state.

https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-fsstatebackend

Regards
Sai

On Fri, Feb 10, 2017 at 6:19 AM, Stefan Richter <[hidden email]> wrote:
Async snapshotting is the default.

Am 10.02.2017 um 14:03 schrieb vinay patil <[hidden email]>:

Hi Stephan,

Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0 version and I have set the statebackend in flink-conf.yaml file to rocksdb, so by default does this do asynchronous checkpointing or I have to specify it at the job level ?

Regards,
Vinay Patil

On Fri, Feb 10, 2017 at 4:16 PM, Stefan Richter [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints go against the file system. This is why the backend is typically faster for small states, but can become problematic for larger states. If your state exceeds a certain size, you should strongly consider to use RocksDB as backend. In particular, RocksDB also offers asynchronous snapshots which is very valuable to keep stream processing running for large state. RocksDB works on native memory/disk, so there is no GC to observe. For cases in which your state fits in memory but GC is a problem you could try using the G1 garbage collector which offers better performance for the FSStateBackend than the default.

Best,
Stefan

Am 10.02.2017 um 11:16 schrieb Vinay Patil <[hidden email]>:

Hi,

I am doing performance test for my pipeline keeping FSStateBackend, I have observed frequent Full GC's after processing 20M records.

When I did memory analysis using MAT, it showed that the many objects maintained by Flink state are live.

Flink keeps the state in memory even after checkpointing , when does this state gets removed / GC. (I am using window operator in which the DTO comes as input)

Also why does Flink keep the state in memory after checkpointing ?

P.S Using RocksDB is not causing Full GC at all.

Regards,
Vinay Patil

If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p11565.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

View this message in context: Re: Frequent Full GC's in case of FSStateBackend
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

If you reply to this email, your message will be added to the discussion below:

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-Full-GC-s-in-case-of-FSStateBackend-tp11564p12126.html

To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

View this message in context: Re: Frequent Full GC's in case of FSStateBackend
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.