Table API and registration of DataSet/DataStream

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Table API and registration of DataSet/DataStream

Flavio Pompermaier
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Fabian Hueske-2
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Flavio Pompermaier
Yes I can do that of course.
What I need is basically the possibility to translate a where clause to a filter function. Is there any utility class that does that in Flink?

On 9 Sep 2017 21:54, "Fabian Hueske" <[hidden email]> wrote:
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Fabian Hueske-2
Not sure what you mean by "translate a where clause to a filter function".

Isn't that exactly what Table.filter(String condition) is doing?
It translates a SQL-like condition (represented as String) into an operator that filter the Table.


2017-09-09 23:49 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Yes I can do that of course.
What I need is basically the possibility to translate a where clause to a filter function. Is there any utility class that does that in Flink?

On 9 Sep 2017 21:54, "Fabian Hueske" <[hidden email]> wrote:
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Flavio Pompermaier
Hi Fabian,
basically these were my problems with Table API.

1 ) Table.sql() has a different where syntax than Table.where() , and this is very annoying (IMHO). Ex:
  Table.sql("SELECT * FROM XXX WHERE Y IS NOT NULL) vs Table.i.where("Y.isNotNull").

2) If I understood correctly, my program that ideally could be something like:

Dataset<Row> ds = ....filter(TableUtils.getWhereAsFilter(ds, fieldTypes, fieldNames, "Y IS NOT NULL");

I should do:

BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
Table table = te.fromDataset(ds,fieldNames); //why not support an array of fieldName
ds = tEnv.toDataSet(table.where("Y.isNotNull"), new RowTypeInfo(fieldTypes));

Is this correct?
Moreover, fromDataset requires fieldNames to be a comma separated String, why not support also fieldNames as String[]...?

Best,
Flavio


On Thu, Sep 14, 2017 at 3:43 PM, Fabian Hueske <[hidden email]> wrote:
Not sure what you mean by "translate a where clause to a filter function".

Isn't that exactly what Table.filter(String condition) is doing?
It translates a SQL-like condition (represented as String) into an operator that filter the Table.


2017-09-09 23:49 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Yes I can do that of course.
What I need is basically the possibility to translate a where clause to a filter function. Is there any utility class that does that in Flink?

On 9 Sep 2017 21:54, "Fabian Hueske" <[hidden email]> wrote:
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Fabian Hueske-2
Hi Flavio,

1) The Java Table API does not aim to resemble SQL but the Scala Table API which is integrated with the host language (Scala).
Hence the different syntax for expressions.

2) Yes, that would be one way to do it. If that adds to much boilerplate code, you could encapsulate the code in your own helper class.
We do not provide a TableUtils class, because this is out of scope of the Table API.
It would be a bit of effort to make this generic for different data types because the DataSet can be of any type (Tuple, Pojo, Row, etc.) and would not be used in the Table API anyway.

Best, Fabian




2017-09-14 16:12 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
basically these were my problems with Table API.

1 ) Table.sql() has a different where syntax than Table.where() , and this is very annoying (IMHO). Ex:
  Table.sql("SELECT * FROM XXX WHERE Y IS NOT NULL) vs Table.i.where("Y.isNotNull").

2) If I understood correctly, my program that ideally could be something like:

Dataset<Row> ds = ....filter(TableUtils.getWhereAsFilter(ds, fieldTypes, fieldNames, "Y IS NOT NULL");

I should do:

BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
Table table = te.fromDataset(ds,fieldNames); //why not support an array of fieldName
ds = tEnv.toDataSet(table.where("Y.isNotNull"), new RowTypeInfo(fieldTypes));

Is this correct?
Moreover, fromDataset requires fieldNames to be a comma separated String, why not support also fieldNames as String[]...?

Best,
Flavio


On Thu, Sep 14, 2017 at 3:43 PM, Fabian Hueske <[hidden email]> wrote:
Not sure what you mean by "translate a where clause to a filter function".

Isn't that exactly what Table.filter(String condition) is doing?
It translates a SQL-like condition (represented as String) into an operator that filter the Table.


2017-09-09 23:49 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Yes I can do that of course.
What I need is basically the possibility to translate a where clause to a filter function. Is there any utility class that does that in Flink?

On 9 Sep 2017 21:54, "Fabian Hueske" <[hidden email]> wrote:
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Table API and registration of DataSet/DataStream

Flavio Pompermaier
I see...anyway for me it continue to be very misleading to have different syntax for where clauses (SQL vs scala)...
Why not make them compatible? Is it that complex?


On Thu, Sep 14, 2017 at 4:26 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

1) The Java Table API does not aim to resemble SQL but the Scala Table API which is integrated with the host language (Scala).
Hence the different syntax for expressions.

2) Yes, that would be one way to do it. If that adds to much boilerplate code, you could encapsulate the code in your own helper class.
We do not provide a TableUtils class, because this is out of scope of the Table API.
It would be a bit of effort to make this generic for different data types because the DataSet can be of any type (Tuple, Pojo, Row, etc.) and would not be used in the Table API anyway.

Best, Fabian




2017-09-14 16:12 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi Fabian,
basically these were my problems with Table API.

1 ) Table.sql() has a different where syntax than Table.where() , and this is very annoying (IMHO). Ex:
  Table.sql("SELECT * FROM XXX WHERE Y IS NOT NULL) vs Table.i.where("Y.isNotNull").

2) If I understood correctly, my program that ideally could be something like:

Dataset<Row> ds = ....filter(TableUtils.getWhereAsFilter(ds, fieldTypes, fieldNames, "Y IS NOT NULL");

I should do:

BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
Table table = te.fromDataset(ds,fieldNames); //why not support an array of fieldName
ds = tEnv.toDataSet(table.where("Y.isNotNull"), new RowTypeInfo(fieldTypes));

Is this correct?
Moreover, fromDataset requires fieldNames to be a comma separated String, why not support also fieldNames as String[]...?

Best,
Flavio


On Thu, Sep 14, 2017 at 3:43 PM, Fabian Hueske <[hidden email]> wrote:
Not sure what you mean by "translate a where clause to a filter function".

Isn't that exactly what Table.filter(String condition) is doing?
It translates a SQL-like condition (represented as String) into an operator that filter the Table.


2017-09-09 23:49 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Yes I can do that of course.
What I need is basically the possibility to translate a where clause to a filter function. Is there any utility class that does that in Flink?

On 9 Sep 2017 21:54, "Fabian Hueske" <[hidden email]> wrote:
Hi Flavio,

I tried to follow your example. If I got it right, you would like to change the registered table by assigning a different DataStream to the original myDs variable.

With registerDataStream("test", myDs, ...) you don't register the variable myDs as a table but it's current value, i.e., a reference to a DataStream object.
By changing the value of myDs, you just override the reference in myDs but do not change the reference that was registered in Calcite's catalog.
This is common behavior in many programming languages including Java.

Right now, there is no way to change or override a registered table. We had this functionality once, but had to remove it after a Calcite version upgrade.
Can you use a new TableEnvironment and register the new table there?

Best, Fabian

2017-09-08 17:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,
I have a doubt about Table API.
Let's say my code is something like:


StreamTableEnvironment te = ...;
RowTypeInfo rtf = new RowTypeInfo(...);
DataStream<Row> myDs = 
te.registerDataStream("test",myDs,columnNames);

Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE ...";
myDs = te.toDataStream(table.where("VALID").select(columnNames), rtf);

If I do:

DataStream<Row> res = te.sql("SELECT * FROM test");

I'd like that res could take the data from the last version of myDs...is this program correct..?
Or should I override the "test" table in the tableEnvironment? Is that possible? I don't see any API to allow this..

Best,
Flavio