Hi there, With S3 as state backend, as well as keeping a large chunk of user state on heap. I can see task manager starts to fail without showing OOM exception. Instead, it shows a generic error message (below) when checkpoint triggered. I assume this has something to do with how state were kept in buffer and flush to s3 when checkpoint triggered. Future, to keep large key/value space, wiki point out using rocksdb as backend. My understanding is using rocksdb will write to local file systems instead of sync to s3. Does flink support memory->rocksdb(local disk)->s3 checkpoint state split yet? Or would implement kvstate interface makes flink take care of large state problem? Chen |
On Tue, May 10, 2016 at 5:07 PM, Chen Qin <[hidden email]> wrote:
> Future, to keep large key/value space, wiki point out using rocksdb as > backend. My understanding is using rocksdb will write to local file systems > instead of sync to s3. Does flink support memory->rocksdb(local disk)->s3 > checkpoint state split yet? Or would implement kvstate interface makes flink > take care of large state problem? Hey Chen, when you use RocksDB, you only need to explicitly configure the file system checkpoint directory, for which you can use S3: new RocksDBStateBackend(new URI("s3://...")) The local disk path are configured via the general Flink temp directory configuration (see taskmanager.tmp.dirs in https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html, default is /tmp). State is written to the local RocksDB instance and the RocksDB files are copied to S3 on checkpoints. Does this help? – Ufuk |
Hi Ufuk,
Yes, it does help with Rocksdb backend! After tune checkpoint frequency align with network throughput, task manager released and job get cancelled are gone. Chen > On May 10, 2016, at 10:33 AM, Ufuk Celebi <[hidden email]> wrote: > >> On Tue, May 10, 2016 at 5:07 PM, Chen Qin <[hidden email]> wrote: >> Future, to keep large key/value space, wiki point out using rocksdb as >> backend. My understanding is using rocksdb will write to local file systems >> instead of sync to s3. Does flink support memory->rocksdb(local disk)->s3 >> checkpoint state split yet? Or would implement kvstate interface makes flink >> take care of large state problem? > > Hey Chen, > > when you use RocksDB, you only need to explicitly configure the file > system checkpoint directory, for which you can use S3: > > new RocksDBStateBackend(new URI("s3://...")) > > The local disk path are configured via the general Flink temp > directory configuration (see taskmanager.tmp.dirs in > https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html, > default is /tmp). > > State is written to the local RocksDB instance and the RocksDB files > are copied to S3 on checkpoints. > > Does this help? > > – Ufuk |
Free forum by Nabble | Edit this page |