OrcTableSource in flink 1.12

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

OrcTableSource in flink 1.12

Nikola Hrusov
Hello,

I am trying to find some examples of how to use the OrcTableSource and query it. 
I got to the documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html and it says that an OrcTableSource is used as below:

 OrcTableSource orcSrc = OrcTableSource.builder()
   .path("file:///my/data/file.orc")
   .forOrcSchema("struct<col1:boolean,col2:tinyint,col3:smallint,col4:int>")
   .build();

 tEnv.registerTableSourceInternal("orcTable", orcSrc);
 Table res = tableEnv.sqlQuery("SELECT * FROM orcTable");

My question is what should tEnv be so that I can use the registerTableSourceInternal method? 
My end goal is to query the orc source and then return a DataSet.

Regards
,
Nikola
Reply | Threaded
Open this post in threaded view
|

Re: OrcTableSource in flink 1.12

Timo Walther
Hi Nikola,


the OrcTableSource has not been updated to be used in a SQL DDL. You can
define your own table factory [1] that translates properties into a
object to create instances or use
`org.apache.flink.table.api.TableEnvironment#fromTableSource`. I
recommend the latter option.

Please keep in mind that we are about to drop DataSet support for Table
API in 1.13. Batch and streaming use cases are already possible with the
unified TableEnvironment.

Are you sure that you really need DataSet API?

Regards,
Timo

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/

On 21.03.21 15:42, Nikola Hrusov wrote:

> Hello,
>
> I am trying to find some examples of how to use the OrcTableSource and
> query it.
> I got to the documentation here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>
> and it says that an OrcTableSource is used as below:
>
> |OrcTableSource orcSrc = OrcTableSource.builder()
> .path("file:///my/data/file.orc")
> .forOrcSchema("struct<col1:boolean,col2:tinyint,col3:smallint,col4:int>") .build();
> tEnv.registerTableSourceInternal("orcTable", orcSrc); Table res =
> tableEnv.sqlQuery("SELECT * FROM orcTable"); |
>
>
> My question is what should tEnv be so that I can use
> the registerTableSourceInternal method?
> My end goal is to query the orc source and then return a DataSet.
>
> Regards
> ,
> Nikola

Reply | Threaded
Open this post in threaded view
|

Re: OrcTableSource in flink 1.12

Nikola Hrusov
Hi Timo,

I need to read ORC files and run a query on them as in the example above. Since the example given in docs is not recommended what should I use?


However, it doesn't say what I should use instead? 

I have looked in all the docs available for 1.12 but I cannot find how to achieve the same result as it was in some previous versions. In some previous versions you could define `tableEnv.registerTableSource(tableName, orcTableSource);` but that method is not available anymore.

What is the way to go from here? I would like to read from orc files, run a query and transform the result. I do not necessarily need it to be with the DataSet API.

Regards
,
Nikola

On Mon, Mar 22, 2021 at 6:49 PM Timo Walther <[hidden email]> wrote:
Hi Nikola,


the OrcTableSource has not been updated to be used in a SQL DDL. You can
define your own table factory [1] that translates properties into a
object to create instances or use
`org.apache.flink.table.api.TableEnvironment#fromTableSource`. I
recommend the latter option.

Please keep in mind that we are about to drop DataSet support for Table
API in 1.13. Batch and streaming use cases are already possible with the
unified TableEnvironment.

Are you sure that you really need DataSet API?

Regards,
Timo

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/

On 21.03.21 15:42, Nikola Hrusov wrote:
> Hello,
>
> I am trying to find some examples of how to use the OrcTableSource and
> query it.
> I got to the documentation here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>
> and it says that an OrcTableSource is used as below:
>
> |OrcTableSource orcSrc = OrcTableSource.builder()
> .path("file:///my/data/file.orc")
> .forOrcSchema("struct<col1:boolean,col2:tinyint,col3:smallint,col4:int>") .build();
> tEnv.registerTableSourceInternal("orcTable", orcSrc); Table res =
> tableEnv.sqlQuery("SELECT * FROM orcTable"); |
>
>
> My question is what should tEnv be so that I can use
> the registerTableSourceInternal method?
> My end goal is to query the orc source and then return a DataSet.
>
> Regards
> ,
> Nikola

Reply | Threaded
Open this post in threaded view
|

Re: OrcTableSource in flink 1.12

Timo Walther
Hi Nikola,

for the ORC source it is fine to use `TableEnvironment#fromTableSource`.
It is true that this method is deprecated, but as I said not all
connectors have been ported to be supported in the SQL DDL via string
properties. Therefore, `TableEnvironment#fromTableSource` is still
accessible until all connectors are support in the DDL.

Btw it might also make sense to look into the Hive connector for reading
ORC.

Regards,
Timo

On 22.03.21 18:02, Nikola Hrusov wrote:

> Hi Timo,
>
> I need to read ORC files and run a query on them as in the example
> above. Since the example given in docs is not recommended what should I use?
>
> I looked into the method you suggest - TableEnvironment#fromTableSource
> - it shows as Deprecated on the docs:
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/table/api/TableEnvironment.html#fromTableSource-org.apache.flink.table.sources.TableSource- 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/table/api/TableEnvironment.html#fromTableSource-org.apache.flink.table.sources.TableSource->
>
> However, it doesn't say what I should use instead?
>
> I have looked in all the docs available for 1.12 but I cannot find how
> to achieve the same result as it was in some previous versions. In some
> previous versions you could define
> `tableEnv.registerTableSource(tableName, orcTableSource);` but that
> method is not available anymore.
>
> What is the way to go from here? I would like to read from orc files,
> run a query and transform the result. I do not necessarily need it to be
> with the DataSet API.
>
> Regards
> ,
> Nikola
>
> On Mon, Mar 22, 2021 at 6:49 PM Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Nikola,
>
>
>     the OrcTableSource has not been updated to be used in a SQL DDL. You
>     can
>     define your own table factory [1] that translates properties into a
>     object to create instances or use
>     `org.apache.flink.table.api.TableEnvironment#fromTableSource`. I
>     recommend the latter option.
>
>     Please keep in mind that we are about to drop DataSet support for Table
>     API in 1.13. Batch and streaming use cases are already possible with
>     the
>     unified TableEnvironment.
>
>     Are you sure that you really need DataSet API?
>
>     Regards,
>     Timo
>
>     [1]
>     https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/
>     <https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/>
>
>     On 21.03.21 15:42, Nikola Hrusov wrote:
>      > Hello,
>      >
>      > I am trying to find some examples of how to use the
>     OrcTableSource and
>      > query it.
>      > I got to the documentation here:
>      >
>     https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html
>     <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>
>
>      >
>     <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html
>     <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>>
>
>      > and it says that an OrcTableSource is used as below:
>      >
>      > |OrcTableSource orcSrc = OrcTableSource.builder()
>      > .path("file:///my/data/file.orc")
>      >
>     .forOrcSchema("struct<col1:boolean,col2:tinyint,col3:smallint,col4:int>")
>     .build();
>      > tEnv.registerTableSourceInternal("orcTable", orcSrc); Table res =
>      > tableEnv.sqlQuery("SELECT * FROM orcTable"); |
>      >
>      >
>      > My question is what should tEnv be so that I can use
>      > the registerTableSourceInternal method?
>      > My end goal is to query the orc source and then return a DataSet.
>      >
>      > Regards
>      > ,
>      > Nikola
>