Detect late data in processing time

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Detect late data in processing time

Soheil Pourbafrani
In Event Time, we can gather bad data using OutputTag, because in Event Time we have Watermark and we can detect late data. But in processing time mode we don't have any watermark to detect bad data. I want to know can we set watermark (for example according to taskmanager's timestamp) and use processing time in creating time windows?
Reply | Threaded
Open this post in threaded view
|

Re: Detect late data in processing time

Hequn Cheng
Hi Soheil,

No, we can't set watermark during processing time.  And there are no late data considering processing time window.
So the problem is what data is bad data when you use processing time? Maybe there are other ways to solve your problem.

Best, Hequn

On Mon, Jul 30, 2018 at 8:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
In Event Time, we can gather bad data using OutputTag, because in Event Time we have Watermark and we can detect late data. But in processing time mode we don't have any watermark to detect bad data. I want to know can we set watermark (for example according to taskmanager's timestamp) and use processing time in creating time windows?

Reply | Threaded
Open this post in threaded view
|

Re: Detect late data in processing time

vino yang
Hi Soheil,

Watermark indicates the progress of the Event time. The reason it exists is because there is a Time skew between Event time and Processing time. Hequn is correct and Watermark cannot be used for processing time. The processing time will be based on the TM local system clock. Usually, when there is a time field in your event that indicates when it actually happened, we will choose Event time. When we choose Processing time, we don't rely on the time information carried by the data itself, so the question is how do you define "bad data".

Thanks, vino.

2018-07-30 22:29 GMT+08:00 Hequn Cheng <[hidden email]>:
Hi Soheil,

No, we can't set watermark during processing time.  And there are no late data considering processing time window.
So the problem is what data is bad data when you use processing time? Maybe there are other ways to solve your problem.

Best, Hequn

On Mon, Jul 30, 2018 at 8:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
In Event Time, we can gather bad data using OutputTag, because in Event Time we have Watermark and we can detect late data. But in processing time mode we don't have any watermark to detect bad data. I want to know can we set watermark (for example according to taskmanager's timestamp) and use processing time in creating time windows?


Reply | Threaded
Open this post in threaded view
|

Re: Detect late data in processing time

Averell
In reply to this post by Soheil Pourbafrani
Hi Soheil,

Why don't you just use the processing time as event time? Simply overriding
extractTimestamp to return your processing time.

Regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Detect late data in processing time

vino yang
Hi Averell,

I personally don't recommend this. 
In fact, Processing Time uses the local physical clock of the node where the specific task is located, rather than setting it upstream in advance. 
This is a bit like another time concept provided by Flink - Ingestion Time. 
So, If you do not specify to use even time, then do not set watermark.

Thanks, vino.

2018-07-31 12:03 GMT+08:00 Averell <[hidden email]>:
Hi Soheil,

Why don't you just use the processing time as event time? Simply overriding
extractTimestamp to return your processing time.

Regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/