Purging Late stream data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Purging Late stream data

G.S.Vijay Raajaa
Hi,

I am having 3 streams which is being merged from a union of kafka topics on a given timestamp. The problem I am facing is that, if there is a delay  in one of the stream and when the data in that particular stream arrives at a later point in time, the merge happens in a delayed fashion. The way I want to solve is that, I want to drop such data streams which comes after a delay ( say 5sec ). Kindly direct me on how to go about it?

Will watermarking (to process in even time) + the allowed lateness help solve the same?

Regards,
Vijay Raajaa G S
Reply | Threaded
Open this post in threaded view
|

Re: Purging Late stream data

Kien Truong
Hi,

One method you can use is using a ProcessFunction.

In the process function, you get the timer service through the function
context,

which can then be used to schedule a task to clean up late data.

Check out the docs for ProcessFunction

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html

Regards,

Kien


On 7/26/2017 9:37 AM, G.S.Vijay Raajaa wrote:

> Hi,
>
> I am having 3 streams which is being merged from a union of kafka
> topics on a given timestamp. The problem I am facing is that, if there
> is a delay  in one of the stream and when the data in that particular
> stream arrives at a later point in time, the merge happens in a
> delayed fashion. The way I want to solve is that, I want to drop such
> data streams which comes after a delay ( say 5sec ). Kindly direct me
> on how to go about it?
>
> Will watermarking (to process in even time) + the allowed lateness
> help solve the same?
>
> Regards,
> Vijay Raajaa G S

Reply | Threaded
Open this post in threaded view
|

Re: Purging Late stream data

G.S.Vijay Raajaa
Sure, Let me try that out. On the same note, does BoundedOutOfOrdernessTimestampExtractor Serve the purpose too?


Regards,
Vijay Raajaa GS 

On Wed, Jul 26, 2017 at 9:22 AM, Kien Truong <[hidden email]> wrote:
Hi,

One method you can use is using a ProcessFunction.

In the process function, you get the timer service through the function context,

which can then be used to schedule a task to clean up late data.

Check out the docs for ProcessFunction

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html

Regards,

Kien



On 7/26/2017 9:37 AM, G.S.Vijay Raajaa wrote:
Hi,

I am having 3 streams which is being merged from a union of kafka topics on a given timestamp. The problem I am facing is that, if there is a delay  in one of the stream and when the data in that particular stream arrives at a later point in time, the merge happens in a delayed fashion. The way I want to solve is that, I want to drop such data streams which comes after a delay ( say 5sec ). Kindly direct me on how to go about it?

Will watermarking (to process in even time) + the allowed lateness help solve the same?

Regards,
Vijay Raajaa G S