Confusions and suggestions about Configuration

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Confusions and suggestions about Configuration

Jason Lee
Hi  everyone,

When I was researching and using Flink recently, I found that the official documentation on how to configure parameters is confusing, mainly as follows:

1. In the Configuration module of the official document, the description and default value of each parameter are introduced. This is very clear, but it does not introduce the description of how those parameters can be configured. For example, when we use and develop Flink SQL tasks, we usually need to configure different parameters for each task for different Flink SQL tasks, such as taskmanager.memory.managed.fraction, but through the documentation I may only know that it can It is configured through the flink-conf.yaml file, but the parameter configuration can also be configured through flink run -yD taskmanager.memory.managed.fraction=0.45. I feel that this method can be described in the official document.

2. In addition, we usually use a DDL Jar package to execute Flink SQL tasks, but we found that some parameters are set by StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value). These parameters cannot take effect. For example, taskmanager.memory.managed.fraction cannot take effect if the parameter is set in the above way (the Note in TableConfig in the source code is as follows: Because options are read at different point in time when performing operations, it is recommended to set configuration options early after instantiating a table environment. ). And StreamExecutionEnvironment.getConfiguration() is protected, which leads to some parameters that cannot be set through the api. I feel that this is not reasonable. Because sometimes, we want to configure different parameters for different tasks in the form of Configuration.setxxx(key, value) in the api, instead of just configuring parameters through flink run -yD.

In summary, for some normal tasks we can use the default parameter configuration, but for some tasks that require personalized configuration, especially Flink SQL tasks, I have a few suggestions on the use of configuration:

1. In the official document, I think it is necessary to add instructions on how to configure these parameters. For example, it can be configured not only in flink-conf.yaml, but also in the running command through flink run -yD, or whether there are other The parameters can be configured in the mode.

2. Regarding the api, I think that StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value) configures parameters in this way. It should be separately explained, which parameters are not effective if configured in this way, otherwise, Some parameters configured in this way will not take effect, which will cause confusion for users.

3. Questions about StreamExecutionEnvironment.getConfiguration() being protected. Will the community develop in later versions? Is there any effective way for users to set some parameters in the api and make them effective, such as configuring the taskmanager.memory.managed.fraction parameter.

Regarding some of the above issues, maybe I did not describe it clearly enough, or because I did not understand the problem clearly, I hope to get a reply from the community.

Best,
Jason

Reply | Threaded
Open this post in threaded view
|

Re: Confusions and suggestions about Configuration

rmetzger0
Note to others on this mailing list. This email has also been sent with the subject "Flink parameter configuration does not take effect" to this list. I replied there, let's also discuss there.

On Tue, Jun 15, 2021 at 7:39 AM Jason Lee <[hidden email]> wrote:
Hi  everyone,

When I was researching and using Flink recently, I found that the official documentation on how to configure parameters is confusing, mainly as follows:

1. In the Configuration module of the official document, the description and default value of each parameter are introduced. This is very clear, but it does not introduce the description of how those parameters can be configured. For example, when we use and develop Flink SQL tasks, we usually need to configure different parameters for each task for different Flink SQL tasks, such as taskmanager.memory.managed.fraction, but through the documentation I may only know that it can It is configured through the flink-conf.yaml file, but the parameter configuration can also be configured through flink run -yD taskmanager.memory.managed.fraction=0.45. I feel that this method can be described in the official document.

2. In addition, we usually use a DDL Jar package to execute Flink SQL tasks, but we found that some parameters are set by StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value). These parameters cannot take effect. For example, taskmanager.memory.managed.fraction cannot take effect if the parameter is set in the above way (the Note in TableConfig in the source code is as follows: Because options are read at different point in time when performing operations, it is recommended to set configuration options early after instantiating a table environment. ). And StreamExecutionEnvironment.getConfiguration() is protected, which leads to some parameters that cannot be set through the api. I feel that this is not reasonable. Because sometimes, we want to configure different parameters for different tasks in the form of Configuration.setxxx(key, value) in the api, instead of just configuring parameters through flink run -yD.

In summary, for some normal tasks we can use the default parameter configuration, but for some tasks that require personalized configuration, especially Flink SQL tasks, I have a few suggestions on the use of configuration:

1. In the official document, I think it is necessary to add instructions on how to configure these parameters. For example, it can be configured not only in flink-conf.yaml, but also in the running command through flink run -yD, or whether there are other The parameters can be configured in the mode.

2. Regarding the api, I think that StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value) configures parameters in this way. It should be separately explained, which parameters are not effective if configured in this way, otherwise, Some parameters configured in this way will not take effect, which will cause confusion for users.

3. Questions about StreamExecutionEnvironment.getConfiguration() being protected. Will the community develop in later versions? Is there any effective way for users to set some parameters in the api and make them effective, such as configuring the taskmanager.memory.managed.fraction parameter.

Regarding some of the above issues, maybe I did not describe it clearly enough, or because I did not understand the problem clearly, I hope to get a reply from the community.

Best,
Jason