Hello,
I have a pipeline which consumes data from a Kafka source. Since, the partitions are partitioned by device_id in case a group of devices is down some partitions will not get normal flow of data. I understand from documentation here[1] in flink 1.11 one can declare the source idle - WatermarkStrategy.<Tuple2<Long, String>>forBoundedOutOfOrderness(Duration.ofSeconds(20)).withIdleness(Duration.ofMinutes(1)); How can I handle this in 1.9, since I am using aws emr and emr doesn't have any release with the latest flink version. One way I could think of is to trigger watermark generation every 10 minutes or so using Periodic watermarks. However, this will not be full proof, are there any better way to handle this more dynamically. Thanks, Hemant |
Hi Team, Can someone share their experiences handling this. Thanks. On Tue, Jul 21, 2020 at 11:30 AM bat man <[hidden email]> wrote:
|
On Wed, 22 Jul 2020, 08:51 bat man, <[hidden email]> wrote:
|
Thanks Niels for a great talk. You have covered two of my pain areas - slim and broken streams. Since I am dealing with device data from on-prem data centers. The first option of generating fabricated watermark events is fine, however as mentioned in your talk how are you handling forwarding it to the next stream(next kafka topic) after enrichment. Have you got any solution for this? -Hemant On Thu, Jul 23, 2020 at 12:05 PM Niels Basjes <[hidden email]> wrote:
|
Hi Hemant, sorry for the late reply. You can just create your own watermark assigner and either copy the assigner from Flink 1.11 or take the one that we use in our trainings [1]. On Thu, Jul 23, 2020 at 8:48 PM bat man <[hidden email]> wrote:
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Hello Arvid, Thanks for the suggestion/reference and my apologies for the late reply. With this I am able to process the data with some topics not having regular data. Obviously, late data is being handheld as in side-output and has a process for it. One challenge is to handle the back-fill as when I run the job with old data because of watermark(taking into account maxOutOfOrderness is set to 10 minutes) the older data gets filtered as late data. For handling this I am thinking of running the side-input with maxOutOfOrderness to the oldest data, regular job to be ok with normal setting. Thanks, Hemant On Thu, Jul 30, 2020 at 2:41 PM Arvid Heise <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |