Re: [DISCUSS] Hierarchies in ConfigOption

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jark Wu-3
From a user's perspective, I prefer the shorter one "format=json", because it's more concise and straightforward. The "kind" is redundant for users. 
Is there a real case requires to represent the configuration in JSON style? As far as I can see, I don't see such requirement, and everything works fine by now.

So I'm in favor of "format=json". But if the community insist to follow code style on this, I'm also fine with the longer one. 

Btw, I also CC user mailing list to listen more user's feedback. Because I think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:
 > Therefore, should we advocate instead:
 >
 > 'format.kind' = 'json',
 > 'format.fail-on-missing-field' = 'false'

Yes. That's pretty much it.

This is reasonable important to nail down as with such violations I
believe we could not actually switch to a standard YAML parser.

On 29/04/2020 16:05, Timo Walther wrote:
> Hi everyone,
>
> discussions around ConfigOption seem to be very popular recently. So I
> would also like to get some opinions on a different topic.
>
> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> agreed on the following DDL syntax:
>
> CREATE TABLE fs_table (
>  ...
> ) WITH (
>  'connector' = 'filesystem',
>  'path' = 'file:///path/to/whatever',
>  'format' = 'csv',
>  'format.allow-comments' = 'true',
>  'format.ignore-parse-errors' = 'true'
> );
>
> Of course this is slightly different from regular Flink core
> configuration but a connector still needs to be configured based on
> these options.
>
> However, I think this FLIP violates our code style guidelines because
>
> 'format' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> is an invalid hierarchy. `format` cannot be a string and a top-level
> object at the same time.
>
> We have similar problems in our runtime configuration:
>
> state.backend=
> state.backend.incremental=
> restart-strategy=
> restart-strategy.fixed-delay.delay=
> high-availability=
> high-availability.cluster-id=
>
> The code style guide states "Think of the configuration as nested
> objects (JSON style)". So such hierarchies cannot be represented in a
> nested JSON style.
>
> Therefore, should we advocate instead:
>
> 'format.kind' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> What do you think?
>
> Thanks,
> Timo
>
> [1]
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Flavio Pompermaier
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Dawid Wysakowicz-2

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Benchao Li
Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like json/yaml.

From the user's perspective, I don't use table configs from a config file like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that this could be a yaml like 
configuration hierarchy. If it has a hierarchy, we maybe consider that in the future to load the 
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top of my head:
'format.name', WDYT?

Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jingsong Li
Thanks Timo for staring the discussion.

I am +1 for "format: 'json'".
Take a look to Dawid's yaml case:

connector: 'filesystem'
path: '...'
format: 'json'
format:
    option1: '...'
    option2: '...'
    option3: '...'

Is this work?
According to my understanding, 'format' key is the attribute of connector, which can be separately configured outside. In the 'format' block, they are the attribute of format.
So this json style block can only contain the properties exclude format itself.

Best,
Jingsong Lee

On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:
Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like json/yaml.

From the user's perspective, I don't use table configs from a config file like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that this could be a yaml like 
configuration hierarchy. If it has a hierarchy, we maybe consider that in the future to load the 
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top of my head:
'format.name', WDYT?

Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Kurt Young
IIUC FLIP-122 already delegate the responsibility for designing and parsing connector properties to connector developers. 
So frankly speaking, no matter which style we choose, there is no strong guarantee for either of these. So it's also possible
that developers can choose a totally different way to express properties, such as:

'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'

which also seems quite straightforward and easy to use. So my opinion on this would be since there is no guarantee for developers
to choose "format" as common prefix of all format related properties, there is not much value to extend 'format' to 'format.kind'.  


Best,
Kurt


On Thu, Apr 30, 2020 at 10:17 AM Jingsong Li <[hidden email]> wrote:
Thanks Timo for staring the discussion.

I am +1 for "format: 'json'".
Take a look to Dawid's yaml case:

connector: 'filesystem'
path: '...'
format: 'json'
format:
    option1: '...'
    option2: '...'
    option3: '...'

Is this work?
According to my understanding, 'format' key is the attribute of connector, which can be separately configured outside. In the 'format' block, they are the attribute of format.
So this json style block can only contain the properties exclude format itself.

Best,
Jingsong Lee

On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:
Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like json/yaml.

From the user's perspective, I don't use table configs from a config file like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that this could be a yaml like 
configuration hierarchy. If it has a hierarchy, we maybe consider that in the future to load the 
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top of my head:
'format.name', WDYT?

Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jingsong Li
In reply to this post by Jingsong Li
Sorry for mistake,

I proposal:

connector: 'filesystem'
path: '...'
format: 'json'
format.option:
    option1: '...'
    option2: '...'
    option3: '...'

And I think most of cases, users just need configure 'format' key, we should make it convenient for them. There is no big problem in making format options more complex.

And for Kafka key and value, we can:

connector: 'kafka'
key.format: 'json'
key.format.option:
    option1: '...'
    option2: '...'
value.format: 'json'
value.format.option:
    option1: '...'
    option2: '...'

Best,
Jingsong Lee 

On Thu, Apr 30, 2020 at 10:16 AM Jingsong Li <[hidden email]> wrote:
Thanks Timo for staring the discussion.

I am +1 for "format: 'json'".
Take a look to Dawid's yaml case:

connector: 'filesystem'
path: '...'
format: 'json'
format:
    option1: '...'
    option2: '...'
    option3: '...'

Is this work?
According to my understanding, 'format' key is the attribute of connector, which can be separately configured outside. In the 'format' block, they are the attribute of format.
So this json style block can only contain the properties exclude format itself.

Best,
Jingsong Lee

On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:
Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like json/yaml.

From the user's perspective, I don't use table configs from a config file like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that this could be a yaml like 
configuration hierarchy. If it has a hierarchy, we maybe consider that in the future to load the 
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top of my head:
'format.name', WDYT?

Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee


--
Best, Jingsong Lee