Hi, thanks for answering.> I guess you consume from Kafka from the earliest offset, so you consume historical data and Flink is catching-up.Yes, it's what's happening. But Kafka is partitioned on sessionId, so skew between partitions cannot explain it.I think the only way it can happen is when when suddenly there's one event with very late timestamp> Just to verify, if you do keyBy sessionId, do you check the gaps of events from the same session?Good point. sessionId is unique in this case, and even if it's not - every single session suffers from this problem of early triggering so it's very unlikely that all millions sessions within that hour had duplicates.I'm suspecting that the fact I have two ProcessWindowFunctions one after the other somehow causes this.I deployed a version with one window function which just prints the timestamps to S3 (to find out if I have event-time jumps) and suddenly it doesn't trigger early (I'm running for 10 minutes and not a single event has arrived to the sink)On Tue, Jun 16, 2020 at 12:01 PM Rafi Aroch <[hidden email]> wrote:Hi Ori,I guess you consume from Kafka from the earliest offset, so you consume historical data and Flink is catching-up.Regarding: My event-time timestamps also do not have big gapsJust to verify, if you do keyBy sessionId, do you check the gaps of events from the same session?RafiOn Tue, Jun 16, 2020 at 9:36 AM Ori Popowski <[hidden email]> wrote:So why is it happening? I have no clue at the moment.My event-time timestamps also do not have big gaps between them that would explain the window triggering.On Mon, Jun 15, 2020 at 9:21 PM Robert Metzger <[hidden email]> wrote:If you are using event time in Flink, it is disconnected from the real world wall clock time.You can process historical data in a streaming program as if it was real-time data (potentially reading through (event time) years of data in a few (wall clock) minutes)On Mon, Jun 15, 2020 at 4:58 PM Yichao Yang <[hidden email]> wrote:HiI think it maybe you use the event time, and the timestamp between your event data is bigger than 30minutes, maybe you can check the source data timestamp.Best,Yichao Yang
发自我的iPhone------------------ Original ------------------From: Ori Popowski <[hidden email]>Date: Mon,Jun 15,2020 10:50 PMTo: user <[hidden email]>Subject: Re: EventTimeSessionWindow firing too soon
Free forum by Nabble | Edit this page |