Adding proctime columng to table api

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding proctime columng to table api

Rex Fenley
Hi,

When using streaming api, if I want a tumbling window on proctime all I have to do is the following:
table.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))...
I don't even need to explicitly create a proctime column.

However, adding an intermediate tumbling window on proctime using the table api has proved more difficult.

The docs seem to possibly imply that I can only add a proctime column on table creation [1] however this isn't what I want because it adds complexity. I want to only render and use proctime at one intermediate tumbling windowed aggregate in the entire query plan, Therefore, I don't want proctime carried from the beginning of all my tables to where I finally need it, I just want it where I need it. Every combination of things I've tried though has seemed to have failed. Is there any way to do this?

Additionally, I don't want to switch to data streams because my tables have retractions and the table api is simpler to use in that sense.


Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Adding proctime columng to table api

Rex Fenley
Also, as an example, I've tried
table.window(Tumble over 1.seconds on proctime() as $"w")...
and it failed.

On Wed, Feb 17, 2021 at 9:30 PM Rex Fenley <[hidden email]> wrote:
Hi,

When using streaming api, if I want a tumbling window on proctime all I have to do is the following:
table.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))...
I don't even need to explicitly create a proctime column.

However, adding an intermediate tumbling window on proctime using the table api has proved more difficult.

The docs seem to possibly imply that I can only add a proctime column on table creation [1] however this isn't what I want because it adds complexity. I want to only render and use proctime at one intermediate tumbling windowed aggregate in the entire query plan, Therefore, I don't want proctime carried from the beginning of all my tables to where I finally need it, I just want it where I need it. Every combination of things I've tried though has seemed to have failed. Is there any way to do this?

Additionally, I don't want to switch to data streams because my tables have retractions and the table api is simpler to use in that sense.


Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Adding proctime columng to table api

Rex Fenley
Following from that, I'm not really sure why I need to provide a proctime timestamp. There should never be any late data with proctime, when a record arrives it should just be put into whatever the current window is. So why is there any requirement to specify a time column in this case?

Thanks!

On Wed, Feb 17, 2021 at 9:33 PM Rex Fenley <[hidden email]> wrote:
Also, as an example, I've tried
table.window(Tumble over 1.seconds on proctime() as $"w")...
and it failed.

On Wed, Feb 17, 2021 at 9:30 PM Rex Fenley <[hidden email]> wrote:
Hi,

When using streaming api, if I want a tumbling window on proctime all I have to do is the following:
table.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))...
I don't even need to explicitly create a proctime column.

However, adding an intermediate tumbling window on proctime using the table api has proved more difficult.

The docs seem to possibly imply that I can only add a proctime column on table creation [1] however this isn't what I want because it adds complexity. I want to only render and use proctime at one intermediate tumbling windowed aggregate in the entire query plan, Therefore, I don't want proctime carried from the beginning of all my tables to where I finally need it, I just want it where I need it. Every combination of things I've tried though has seemed to have failed. Is there any way to do this?

Additionally, I don't want to switch to data streams because my tables have retractions and the table api is simpler to use in that sense.


Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US