(DEPRECATED) Apache Flink User Mailing List archive.

SQL for Flink

Classic

List

Threaded

5 messages Options

Radu Tudoran

SQL for Flink

Hi,

As a follow up to multiple discussions that happened during Flink Forward about how SQL should be supported by Flink, I was thinking to make a couple of proposals.

Disclaimer: I do not claim I have managed to synthesized all the discussions and probably a great deal of things are still missing

Why supporting SQL for Flink?

- A goal to support SQL for Flink should be to enable larger adoption of Flink – particularly for data scientists / data engineers who might not want/know how to program against the existing APIs

- The main implication as I see from this is that SQL should serve as a translation tool of the data processing processing flow to a stream topology that will be executed by Flink

- This would require to support rather soon an SQL client for Flink

How many features should be supported?

- In order to enable a (close to ) full benefit of the processing capabilities of Flink, I believe most of the processing types should be supported – this includes all different types of windows, aggregations, transformations, joins….

- I would propose that UDFs should also be supported such that one can easily add more complex computation if needed

- In the spirit of the extensibility that Flink supports for the operators, functions… such custom operators should be supported to replace the default implementations of the SQL logical operators

How much customization should be enabled?

- Regarding customization this could be provided by configuration files. Such a configuration can cover the policies for how the triggers, evictors, parallelization … will be done for the specific translation of the SQL query into Flink code

- In order to support the integration of custom operators for specific SQL logical operators, the users should be enabled also to provide translation RULES that will replace the default ones (e.g. if a user want to define their own CUSTOM_TABLE_SCAN, it should be able to provide something like configuration.replaceRule(DataStreamScanRule.INSTANCE , CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new translation rule can be handled from the cost than simply configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?

Dr. Radu Tudoran

Senior Research Engineer - Big Data Expert

IT R&D Division

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

E-mail: [hidden email]

Mobile: +49 15209084330

Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

Deepak Sharma

Re: SQL for Flink

Yes.I agree to having SQL for Flink.

I can take up some tasks as well once this starts.

Thanks

Deepak

On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

As a follow up to multiple discussions that happened during Flink Forward about how SQL should be supported by Flink, I was thinking to make a couple of proposals.

Disclaimer: I do not claim I have managed to synthesized all the discussions and probably a great deal of things are still missing

Why supporting SQL for Flink?

-          A goal to support SQL for Flink should be to enable larger adoption of Flink – particularly for data scientists / data engineers who might not want/know how to program against the existing APIs

-          The main implication as I see from this is that SQL should serve as a translation tool of the data processing processing flow to a stream topology that will be executed by Flink

-          This would require to support rather soon an SQL client for Flink

How many features should be supported?

-          In order to enable a (close to ) full benefit of the processing capabilities of Flink, I believe most of the processing types should be supported – this includes all different types of windows, aggregations, transformations, joins….

-          I would propose that UDFs should also be supported such that one can easily add more complex computation if needed

-          In the spirit of the extensibility that Flink supports for the operators, functions… such custom operators should be supported to replace the default implementations of the SQL logical operators

How much customization should be enabled?

-          Regarding customization this could be provided by configuration files. Such a configuration can cover the policies for how the triggers, evictors, parallelization … will be done for the specific translation of the SQL query into Flink code

-          In order to support the integration of custom operators for specific SQL logical operators, the users should be enabled also to provide translation RULES that will replace the default ones (e.g. if a user want to define their own CUSTOM_TABLE_SCAN, it should be able to provide something like configuration.replaceRule(DataStreamScanRule.INSTANCE , CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new translation rule can be handled from the cost than simply configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?

Dr. Radu Tudoran

Senior Research Engineer - Big Data Expert

IT R&D Division

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Greg Hogan

Re: SQL for Flink

Hi Deepak,

There are many open tickets for Flink's SQL API. Documentation is at https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20%22Table%20API%20%26%20SQL%22%20ORDER%20BY%20priority%20DESC

Greg

On Wed, Sep 14, 2016 at 12:27 PM, Deepak Sharma <[hidden email]> wrote:

+1
Yes.I agree to having SQL for Flink.
I can take up some tasks as well once this starts.

Thanks
Deepak

On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

As a follow up to multiple discussions that happened during Flink Forward about how SQL should be supported by Flink, I was thinking to make a couple of proposals.

Disclaimer: I do not claim I have managed to synthesized all the discussions and probably a great deal of things are still missing

Why supporting SQL for Flink?

-          A goal to support SQL for Flink should be to enable larger adoption of Flink – particularly for data scientists / data engineers who might not want/know how to program against the existing APIs

-          The main implication as I see from this is that SQL should serve as a translation tool of the data processing processing flow to a stream topology that will be executed by Flink

-          This would require to support rather soon an SQL client for Flink

How many features should be supported?

-          In order to enable a (close to ) full benefit of the processing capabilities of Flink, I believe most of the processing types should be supported – this includes all different types of windows, aggregations, transformations, joins….

-          I would propose that UDFs should also be supported such that one can easily add more complex computation if needed

-          In the spirit of the extensibility that Flink supports for the operators, functions… such custom operators should be supported to replace the default implementations of the SQL logical operators

How much customization should be enabled?

-          Regarding customization this could be provided by configuration files. Such a configuration can cover the policies for how the triggers, evictors, parallelization … will be done for the specific translation of the SQL query into Flink code

-          In order to support the integration of custom operators for specific SQL logical operators, the users should be enabled also to provide translation RULES that will replace the default ones (e.g. if a user want to define their own CUSTOM_TABLE_SCAN, it should be able to provide something like configuration.replaceRule(DataStreamScanRule.INSTANCE , CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new translation rule can be handled from the cost than simply configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?

Dr. Radu Tudoran

Senior Research Engineer - Big Data Expert

IT R&D Division

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Deepak Sharma

Re: SQL for Flink

Thanks Greg .
I will start picking some of them.

Thanks
Deepak

On 14 Sep 2016 6:31 pm, "Greg Hogan" <[hidden email]> wrote:

Hi Deepak,

There are many open tickets for Flink's SQL API. Documentation is at https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20%22Table%20API%20%26%20SQL%22%20ORDER%20BY%20priority%20DESC

Greg

On Wed, Sep 14, 2016 at 12:27 PM, Deepak Sharma <[hidden email]> wrote:
+1
Yes.I agree to having SQL for Flink.
I can take up some tasks as well once this starts.

Thanks
Deepak

On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

As a follow up to multiple discussions that happened during Flink Forward about how SQL should be supported by Flink, I was thinking to make a couple of proposals.

Disclaimer: I do not claim I have managed to synthesized all the discussions and probably a great deal of things are still missing

Why supporting SQL for Flink?

-          A goal to support SQL for Flink should be to enable larger adoption of Flink – particularly for data scientists / data engineers who might not want/know how to program against the existing APIs

-          The main implication as I see from this is that SQL should serve as a translation tool of the data processing processing flow to a stream topology that will be executed by Flink

-          This would require to support rather soon an SQL client for Flink

How many features should be supported?

-          In order to enable a (close to ) full benefit of the processing capabilities of Flink, I believe most of the processing types should be supported – this includes all different types of windows, aggregations, transformations, joins….

-          I would propose that UDFs should also be supported such that one can easily add more complex computation if needed

-          In the spirit of the extensibility that Flink supports for the operators, functions… such custom operators should be supported to replace the default implementations of the SQL logical operators

How much customization should be enabled?

-          Regarding customization this could be provided by configuration files. Such a configuration can cover the policies for how the triggers, evictors, parallelization … will be done for the specific translation of the SQL query into Flink code

-          In order to support the integration of custom operators for specific SQL logical operators, the users should be enabled also to provide translation RULES that will replace the default ones (e.g. if a user want to define their own CUSTOM_TABLE_SCAN, it should be able to provide something like configuration.replaceRule(DataStreamScanRule.INSTANCE , CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new translation rule can be handled from the cost than simply configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?

Dr. Radu Tudoran

Senior Research Engineer - Big Data Expert

IT R&D Division

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Timo Walther

Re: SQL for Flink

Hi Radu,

thanks for continuing the discussion we had at the conference here. Your proposals are all valid. If you have a look at the inital design document [1] for Table API/SQL we plan to add a SQL client at some point, but first we should focus on extending the set of supported operations. A first step regarding windows and aggregations on streams can be found in the current FLIP-11 [2]. However, it only describes the Table API so far. How Stream SQL should extactly look like is still up for discussion (together with the Calcite guys). In a long term view the Table API could become a DataSet++ or DataStream++. We could add support for UDFs and operations such as map/reduce. If customization/replacement of existing rules is required we can add a Jira issue for that.

The development just started so there is a lot to improve and to add. New contibutions, discussions on certain features and design documents are always welcome.
Btw. this discussion should actually be continued on the dev mailing list.

Timo

[1] https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/edit#heading=h.4vdi2v1tlg8h
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations

Am 14/09/16 um 15:07 schrieb Deepak Sharma:

Thanks Greg .
I will start picking some of them.

Thanks
Deepak

On 14 Sep 2016 6:31 pm, "Greg Hogan" <[hidden email]> wrote:

Hi Deepak,

There are many open tickets for Flink's SQL API. Documentation is at https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20%22Table%20API%20%26%20SQL%22%20ORDER%20BY%20priority%20DESC

Greg

On Wed, Sep 14, 2016 at 12:27 PM, Deepak Sharma <[hidden email]> wrote:

+1
Yes.I agree to having SQL for Flink.
I can take up some tasks as well once this starts.

Thanks

Deepak

On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

As a follow up to multiple discussions that happened during Flink Forward about how SQL should be supported by Flink, I was thinking to make a couple of proposals.

Disclaimer: I do not claim I have managed to synthesized all the discussions and probably a great deal of things are still missing

Why supporting SQL for Flink?

-          A goal to support SQL for Flink should be to enable larger adoption of Flink – particularly for data scientists / data engineers who might not want/know how to program against the existing APIs

-          The main implication as I see from this is that SQL should serve as a translation tool of the data processing processing flow to a stream topology that will be executed by Flink

-          This would require to support rather soon an SQL client for Flink

How many features should be supported?

-          In order to enable a (close to ) full benefit of the processing capabilities of Flink, I believe most of the processing types should be supported – this includes all different types of windows, aggregations, transformations, joins….

-          I would propose that UDFs should also be supported such that one can easily add more complex computation if needed

-          In the spirit of the extensibility that Flink supports for the operators, functions… such custom operators should be supported to replace the default implementations of the SQL logical operators

How much customization should be enabled?

-          Regarding customization this could be provided by configuration files. Such a configuration can cover the policies for how the triggers, evictors, parallelization … will be done for the specific translation of the SQL query into Flink code

-          In order to support the integration of custom operators for specific SQL logical operators, the users should be enabled also to provide translation RULES that will replace the default ones (e.g. if a user want to define their own CUSTOM_TABLE_SCAN, it should be able to provide something like configuration.replaceRule(DataStreamScanRule.INSTANCE , CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new translation rule can be handled from the cost than simply configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?

Dr. Radu Tudoran

Senior Research Engineer - Big Data Expert

IT R&D Division

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

E-mail: [hidden email]

Mobile: <a moz-do-not-send="true" href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a moz-do-not-send="true" href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

--

Thanks
Deepak
www.bigdatabig.com
www.keosha.net

-- 
Freundliche Grüße / Kind Regards

Timo Walther 

Follow me: @twalthr
https://www.linkedin.com/in/twalthr