Hi guys! As I understood (I hope I’m wrong) the current design concept of the watermarking mechanism is that it tight to the latest watermark and there is no way to separate those watermarks by key in keyed stream (I hope at some point it’l be mentioned in the documentation as it unfortunately misleading). Could you share your thoughts on how to replay historical data in event–time manner (i.e. from db to working application)? The solution with the processing time is not suitable here as the sessions windows are needed. Thank you! -- Best regards, Kanstantsin Kamkou
email: [hidden email] web: http://2ka.by/ mobile: +49 172 5432334 skype: kkamkou |
I don't think I understood all of your question but with regard to the watermarking and keys.. You are correct that watermarking (event time advancement) is not per key. Event-time is a local property of each Task in an executing Flink job. It has nothing to do with keys. It has only to do with the input data timestamps seen by each task and the watermarking function (which isn't per-key). I hope that helps. With regard to how to play historical data.. Well there are many ways to approach that. Can you narrow down your constraints? Where does the historical data live? -Jamie On Thu, Jan 17, 2019 at 4:36 PM Kanstantsin Kamkou <[hidden email]> wrote:
|
Thanks for the reply. As mentioned before the data comes from the database. Timestams are from one months ago. And I’m searching a way on how to dump this data into a working flink application which already processed this data (watermarks are far away from those dates). On Fri 18. Jan 2019 at 03:22, Jamie Grier <[hidden email]> wrote:
-- Best regards, Kanstantsin Kamkou email: [hidden email] web: http://2ka.by/ mobile: +49 172 5432334 skype: kkamkou |
So, do you mean to have your application running in real-time and use the same instance of it to also process historical data at the same time? If that's the case then I would advise not to try to do it that way. What I would recommend instead is to process that historical data with another instance of the application. If this isn't what you're trying to accomplish please be more thorough in your explanation.. Thanks. -Jamie On Thu, Jan 17, 2019 at 10:34 PM Kanstantsin Kamkou <[hidden email]> wrote:
|
Yeah, that’s what I have so far in my solutions pocket. Another problem is to spawn a huge application just to process a hundred entries... :( If you want the whole picture: there is a number of devices with internal acknowledgment system to guarantee the order. Nevertheless sometimes the network might be down for one particular device for days. The task more or less is to replay the whole missing set or process this out-of-order data preserving session window functionality. On Fri 18. Jan 2019 at 17:04, Jamie Grier <[hidden email]> wrote:
Best regards, Kanstantsin Kamkou
email: [hidden email] web: http://2ka.by/ mobile: +49 172 5432334 skype: kkamkou |
Free forum by Nabble | Edit this page |