How is proctime represented?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How is proctime represented?

Rex Fenley
Hello,

When using PROCTIME() in CREATE DDL for a source, is the proctime attribute a timestamp generated at the time of row ingestion at the source and then forwarded through the graph execution, or is proctime attribute a placeholder that says "fill me in with a timestamp" once it's being used directly by some operator, by some machine?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: How is proctime represented?

Chesnay Schepler
Could you check whether this answers your question?


On 2/19/2021 7:29 AM, Rex Fenley wrote:
Hello,

When using PROCTIME() in CREATE DDL for a source, is the proctime attribute a timestamp generated at the time of row ingestion at the source and then forwarded through the graph execution, or is proctime attribute a placeholder that says "fill me in with a timestamp" once it's being used directly by some operator, by some machine?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US


Reply | Threaded
Open this post in threaded view
|

Re: How is proctime represented?

Rex Fenley
Reading the documentation you posted again after posting this question, it does sound like it's simply a placeholder that only gets filled in when used by an operator, then again, that's still not exactly what it says so I only feel 70% confident like that's what is happening.

On Thu, Feb 18, 2021 at 10:55 PM Chesnay Schepler <[hidden email]> wrote:
Could you check whether this answers your question?


On 2/19/2021 7:29 AM, Rex Fenley wrote:
Hello,

When using PROCTIME() in CREATE DDL for a source, is the proctime attribute a timestamp generated at the time of row ingestion at the source and then forwarded through the graph execution, or is proctime attribute a placeholder that says "fill me in with a timestamp" once it's being used directly by some operator, by some machine?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US




--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: How is proctime represented?

Chesnay Schepler
hmm...I can now see where that uncertainty comes from.

My impression is that PROCTIME is not evaluated eagerly, and instead and operators relying on this column generate their own processing timestamp. What throws me off is that I cannot tell how you would tell Flink to store a processing timestamp as is in a row (to essentially create something like ingestion time).

I'm looping in Timo to provide some clarity.

On 2/19/2021 8:39 AM, Rex Fenley wrote:
Reading the documentation you posted again after posting this question, it does sound like it's simply a placeholder that only gets filled in when used by an operator, then again, that's still not exactly what it says so I only feel 70% confident like that's what is happening.

On Thu, Feb 18, 2021 at 10:55 PM Chesnay Schepler <[hidden email]> wrote:
Could you check whether this answers your question?


On 2/19/2021 7:29 AM, Rex Fenley wrote:
Hello,

When using PROCTIME() in CREATE DDL for a source, is the proctime attribute a timestamp generated at the time of row ingestion at the source and then forwarded through the graph execution, or is proctime attribute a placeholder that says "fill me in with a timestamp" once it's being used directly by some operator, by some machine?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US




--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US


Reply | Threaded
Open this post in threaded view
|

Re: How is proctime represented?

Timo Walther
Chesnay is right. The PROCTIME() is lazy evaluated and executed when its
result is needed as an argument for another expression or function. So
within the pipeline the column is NULL but when you want to compute
something e.g. CAST(proctime AS TIMESTAMP(3)) it will be materialized
into the row. If you want to use ingestion time, you should be able to use:

CREATE TABLE (
   ingest_ts AS CAST(PROCTIME() AS TIMESTAMP(3))
)

Regards,
Timo


On 19.02.21 10:23, Chesnay Schepler wrote:

> hmm...I can now see where that uncertainty comes from.
>
> My /impression/ is that PROCTIME is not evaluated eagerly, and instead
> and operators relying on this column generate their own processing
> timestamp. What throws me off is that I cannot tell how you would tell
> Flink to store a processing timestamp as is in a row (to essentially
> create something like ingestion time).
>
> I'm looping in Timo to provide some clarity.
>
> On 2/19/2021 8:39 AM, Rex Fenley wrote:
>> Reading the documentation you posted again after posting this
>> question, it does sound like it's simply a placeholder that only gets
>> filled in when used by an operator, then again, that's still not
>> exactly what it says so I only feel 70% confident like that's what is
>> happening.
>>
>> On Thu, Feb 18, 2021 at 10:55 PM Chesnay Schepler <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Could you check whether this answers your question?
>>
>>     https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/timely-stream-processing.html#notions-of-time-event-time-and-processing-time
>>     <https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/timely-stream-processing.html#notions-of-time-event-time-and-processing-time>
>>
>>     On 2/19/2021 7:29 AM, Rex Fenley wrote:
>>>     Hello,
>>>
>>>     When using PROCTIME() in CREATE DDL for a source, is the proctime
>>>     attribute a timestamp generated at the time of row ingestion at
>>>     the source and then forwarded through the graph execution, or is
>>>     proctime attribute a placeholder that says "fill me in with a
>>>     timestamp" once it's being used directly by some operator, by
>>>     some machine?
>>>
>>>     Thanks!
>>>
>>>     --
>>>
>>>     Rex Fenley|Software Engineer - Mobile and Backend
>>>
>>>
>>>     Remind.com <https://www.remind.com/>| BLOG
>>>     <http://blog.remind.com/> | FOLLOW US
>>>     <https://twitter.com/remindhq> | LIKE US
>>>     <https://www.facebook.com/remindhq>
>>>
>>
>>
>>
>> --
>>
>> Rex Fenley|Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
>> FOLLOW US <https://twitter.com/remindhq> | LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: How is proctime represented?

Rex Fenley
Thanks yall this is really helpful!

On Fri, Feb 19, 2021 at 2:40 AM Timo Walther <[hidden email]> wrote:
Chesnay is right. The PROCTIME() is lazy evaluated and executed when its
result is needed as an argument for another expression or function. So
within the pipeline the column is NULL but when you want to compute
something e.g. CAST(proctime AS TIMESTAMP(3)) it will be materialized
into the row. If you want to use ingestion time, you should be able to use:

CREATE TABLE (
   ingest_ts AS CAST(PROCTIME() AS TIMESTAMP(3))
)

Regards,
Timo


On 19.02.21 10:23, Chesnay Schepler wrote:
> hmm...I can now see where that uncertainty comes from.
>
> My /impression/ is that PROCTIME is not evaluated eagerly, and instead
> and operators relying on this column generate their own processing
> timestamp. What throws me off is that I cannot tell how you would tell
> Flink to store a processing timestamp as is in a row (to essentially
> create something like ingestion time).
>
> I'm looping in Timo to provide some clarity.
>
> On 2/19/2021 8:39 AM, Rex Fenley wrote:
>> Reading the documentation you posted again after posting this
>> question, it does sound like it's simply a placeholder that only gets
>> filled in when used by an operator, then again, that's still not
>> exactly what it says so I only feel 70% confident like that's what is
>> happening.
>>
>> On Thu, Feb 18, 2021 at 10:55 PM Chesnay Schepler <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Could you check whether this answers your question?
>>
>>     https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/timely-stream-processing.html#notions-of-time-event-time-and-processing-time
>>     <https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/timely-stream-processing.html#notions-of-time-event-time-and-processing-time>
>>
>>     On 2/19/2021 7:29 AM, Rex Fenley wrote:
>>>     Hello,
>>>
>>>     When using PROCTIME() in CREATE DDL for a source, is the proctime
>>>     attribute a timestamp generated at the time of row ingestion at
>>>     the source and then forwarded through the graph execution, or is
>>>     proctime attribute a placeholder that says "fill me in with a
>>>     timestamp" once it's being used directly by some operator, by
>>>     some machine?
>>>
>>>     Thanks!
>>>
>>>     --
>>>
>>>     Rex Fenley|Software Engineer - Mobile and Backend
>>>
>>>
>>>     Remind.com <https://www.remind.com/>| BLOG
>>>     <http://blog.remind.com/> | FOLLOW US
>>>     <https://twitter.com/remindhq> | LIKE US
>>>     <https://www.facebook.com/remindhq>
>>>
>>
>>
>>
>> --
>>
>> Rex Fenley|Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
>> FOLLOW US <https://twitter.com/remindhq> | LIKE US
>> <https://www.facebook.com/remindhq>
>>
>



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US