RocksDB default logging configuration

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

RocksDB default logging configuration

Bajaj, Abhinav

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 

Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Bajaj, Abhinav

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" <[hidden email]>
Date: Monday, April 20, 2020 at 3:23 PM
To: "[hidden email]" <[hidden email]>
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 

Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Chesnay Schepler
AFAIK this is not possible; the client doesn't know anything about the cluster configuration.

FLINK-15747 proposes to add an additional config option for controlling the logging behavior.

The only workaround I can think of would be to create a custom Flink distribution with a modified RocksDBStateBackend which always sets these options by default.


On 23/04/2020 03:24, Bajaj, Abhinav wrote:

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" [hidden email]
Date: Monday, April 20, 2020 at 3:23 PM
To: [hidden email] [hidden email]
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 


Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Bajaj, Abhinav

It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes up the operational aspects of cluster within the job.

RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”, true);
stateBackend.setDbStoragePath(“DB_STORAGE_PATH”);

 

Also, noticed that the RocksDBStateBackend picks up the savepoint dir from property “state.savepoints.dir” of the flink-conf.yaml file but does not pick up the “state.backend.rocksdb.localdir”.

So I had to set from the job as above.

 

I feel there is a disconnect and would like to get confirmation of the above behavior, if possible.

I am using Flink 1.7.1.

 

Thanks Chesnay for your response below.

 

~ Abhinav Bajaj

 

From: Chesnay Schepler <[hidden email]>
Date: Wednesday, April 22, 2020 at 11:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

AFAIK this is not possible; the client doesn't know anything about the cluster configuration.

 

FLINK-15747 proposes to add an additional config option for controlling the logging behavior.

 

The only workaround I can think of would be to create a custom Flink distribution with a modified RocksDBStateBackend which always sets these options by default.

 

 

On 23/04/2020 03:24, Bajaj, Abhinav wrote:

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" [hidden email]
Date: Monday, April 20, 2020 at 3:23 PM
To: [hidden email] [hidden email]
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Yun Tang
Hi Bajaj

Current "state.checkpoints.dir" defines cluster-wide location for cluster and each job would create the specific checkpoint location under it with job-id sub-directory. It is the same for the checkpoint URL in RocksDB.

And the configuration option "state.backend.rocksdb.localdir" [1] should work for RocksDB in Flink-1.7.1.


Best
Yun Tang

From: Bajaj, Abhinav <[hidden email]>
Sent: Tuesday, April 28, 2020 8:03
To: [hidden email] <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration
 

It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes up the operational aspects of cluster within the job.

RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”, true);
stateBackend.setDbStoragePath(“DB_STORAGE_PATH”);

 

Also, noticed that the RocksDBStateBackend picks up the savepoint dir from property “state.savepoints.dir” of the flink-conf.yaml file but does not pick up the “state.backend.rocksdb.localdir”.

So I had to set from the job as above.

 

I feel there is a disconnect and would like to get confirmation of the above behavior, if possible.

I am using Flink 1.7.1.

 

Thanks Chesnay for your response below.

 

~ Abhinav Bajaj

 

From: Chesnay Schepler <[hidden email]>
Date: Wednesday, April 22, 2020 at 11:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

AFAIK this is not possible; the client doesn't know anything about the cluster configuration.

 

FLINK-15747 proposes to add an additional config option for controlling the logging behavior.

 

The only workaround I can think of would be to create a custom Flink distribution with a modified RocksDBStateBackend which always sets these options by default.

 

 

On 23/04/2020 03:24, Bajaj, Abhinav wrote:

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" [hidden email]
Date: Monday, April 20, 2020 at 3:23 PM
To: [hidden email] [hidden email]
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Bajaj, Abhinav

Thanks Yun for your response.

 

It seems creating the RocksDBStateBackend from the job requires providing the checkpoint URL whereas the savepoint url seems to default to “state.savepoints.dir” of the flink-conf.yaml.

 

I was expecting similar behavior to create the RocksDBStateBackend without providing the checkpoint url and it would default to “state.checkpoints.dir” of the flink-conf.yaml, like savepoints.

But it seems there is no option to do that (check my original mail below).

 

Am I misinterpreting the code or documentation? Is my observation correct?

 

Appreciate the engagement.

 

Thanks much,

~ Abhinav Bajaj

 

From: Yun Tang <[hidden email]>
Date: Monday, April 27, 2020 at 8:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

Hi Bajaj

 

Current "state.checkpoints.dir" defines cluster-wide location for cluster and each job would create the specific checkpoint location under it with job-id sub-directory. It is the same for the checkpoint URL in RocksDB.

 

And the configuration option "state.backend.rocksdb.localdir" [1] should work for RocksDB in Flink-1.7.1.

 

 

Best

Yun Tang


From: Bajaj, Abhinav <[hidden email]>
Sent: Tuesday, April 28, 2020 8:03
To: [hidden email] <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes up the operational aspects of cluster within the job.

RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”, true);
stateBackend.setDbStoragePath(“DB_STORAGE_PATH”);

 

Also, noticed that the RocksDBStateBackend picks up the savepoint dir from property “state.savepoints.dir” of the flink-conf.yaml file but does not pick up the “state.backend.rocksdb.localdir”.

So I had to set from the job as above.

 

I feel there is a disconnect and would like to get confirmation of the above behavior, if possible.

I am using Flink 1.7.1.

 

Thanks Chesnay for your response below.

 

~ Abhinav Bajaj

 

From: Chesnay Schepler <[hidden email]>
Date: Wednesday, April 22, 2020 at 11:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

AFAIK this is not possible; the client doesn't know anything about the cluster configuration.

 

FLINK-15747 proposes to add an additional config option for controlling the logging behavior.

 

The only workaround I can think of would be to create a custom Flink distribution with a modified RocksDBStateBackend which always sets these options by default.

 

 

On 23/04/2020 03:24, Bajaj, Abhinav wrote:

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" [hidden email]
Date: Monday, April 20, 2020 at 3:23 PM
To: [hidden email] [hidden email]
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: RocksDB default logging configuration

Yun Tang
Hi Bajaj

Actually I don't totally understand what's your description, which conflicts with Flink codebase. Please follow either one of guide below to create RocksDBStateBackend:

  • Set the state backend to environment programmatically, which has the highest priority over configuration in flink-conf.yaml
    env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/"));
  • Configure at least two options below in flink-conf.yaml to use RocksDBStateBackend (this could be overridden by setting state backend expliclitly in environment):
    state.backend: rocksdb
    state.checkpoints.dir: hdfs:///checkpoint-path

Best
Yun Tang

From: Bajaj, Abhinav <[hidden email]>
Sent: Wednesday, April 29, 2020 3:16
To: Yun Tang <[hidden email]>; [hidden email] <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration
 

Thanks Yun for your response.

 

It seems creating the RocksDBStateBackend from the job requires providing the checkpoint URL whereas the savepoint url seems to default to “state.savepoints.dir” of the flink-conf.yaml.

 

I was expecting similar behavior to create the RocksDBStateBackend without providing the checkpoint url and it would default to “state.checkpoints.dir” of the flink-conf.yaml, like savepoints.

But it seems there is no option to do that (check my original mail below).

 

Am I misinterpreting the code or documentation? Is my observation correct?

 

Appreciate the engagement.

 

Thanks much,

~ Abhinav Bajaj

 

From: Yun Tang <[hidden email]>
Date: Monday, April 27, 2020 at 8:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

Hi Bajaj

 

Current "state.checkpoints.dir" defines cluster-wide location for cluster and each job would create the specific checkpoint location under it with job-id sub-directory. It is the same for the checkpoint URL in RocksDB.

 

And the configuration option "state.backend.rocksdb.localdir" [1] should work for RocksDB in Flink-1.7.1.

 

 

Best

Yun Tang


From: Bajaj, Abhinav <[hidden email]>
Sent: Tuesday, April 28, 2020 8:03
To: [hidden email] <[hidden email]>
Cc: Chesnay Schepler <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes up the operational aspects of cluster within the job.

RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”, true);
stateBackend.setDbStoragePath(“DB_STORAGE_PATH”);

 

Also, noticed that the RocksDBStateBackend picks up the savepoint dir from property “state.savepoints.dir” of the flink-conf.yaml file but does not pick up the “state.backend.rocksdb.localdir”.

So I had to set from the job as above.

 

I feel there is a disconnect and would like to get confirmation of the above behavior, if possible.

I am using Flink 1.7.1.

 

Thanks Chesnay for your response below.

 

~ Abhinav Bajaj

 

From: Chesnay Schepler <[hidden email]>
Date: Wednesday, April 22, 2020 at 11:17 PM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: RocksDB default logging configuration

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

AFAIK this is not possible; the client doesn't know anything about the cluster configuration.

 

FLINK-15747 proposes to add an additional config option for controlling the logging behavior.

 

The only workaround I can think of would be to create a custom Flink distribution with a modified RocksDBStateBackend which always sets these options by default.

 

 

On 23/04/2020 03:24, Bajaj, Abhinav wrote:

Bumping this one again to catch some attention.

 

From: "Bajaj, Abhinav" [hidden email]
Date: Monday, April 20, 2020 at 3:23 PM
To: [hidden email] [hidden email]
Subject: RocksDB default logging configuration

 

Hi,

 

Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FLINK-15068.

It seems the workaround suggested uses the OptionsFactory to set some of the parameters from inside the job.

 

Since we provision the Flink cluster(version 1.7.1) for the teams, we control the RocksDB statebackend configuration from flink-conf.yaml.

And it seems there isn’t any related RocksDB configuration to set in flink-conf.yaml.

 

Is there a way for the job developer to retrieve the default statebackend information from the cluster in the job and set the DBOptions on top of it?

 

Appreciate the help!

 

~ Abhinav Bajaj

 

PS:  Sharing below snippet as desired option if possible -

 

StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();

StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions columnFamilyOptions) {
  return columnFamilyOptions;
}

});