Can Flink help us solve the following use case

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Can Flink help us solve the following use case

Yoandy Rodríguez
Hello everybody,

We have the following situation:

1) A data stream which collects all system events (near 1/2 a mil per day).

2) A database storing some aggregation of the data.

We want to split the data into different "time slices" and be able to
"tag it" accordingly.

Example:

the events in the first hour will be tagged as such:

Time of arrival (slice)        Tag

0:00:00 - 0:59:59               Last Hour

0:30:00 - 0:59:59              Last 1/2 Hour

0:50:00 - 0:59:59              Last 10 minutes

Now, when we reach 1:09:59 the "last ten minutes" tags, moves to  that
slice, and so do the other ones.

Mi initial idea was to have multiple windows operating over the same
stream, but in that case I would have

to keep a longer window just to remove the tag for events after the 1
hour period.  Is there any way to avoid this?


PD.

This is part of my first Flink project so alternative
solutions/literature are very much welcome


Reply | Threaded
Open this post in threaded view
|

Re: Can Flink help us solve the following use case

Sameer Wadkar
You could do this using custom triggers and evictors in Flink. That way you can control when the windows fire and what elements are fired with it. And lastly the custom evictors know when to remove elements from the window.

Yes Flink can support it.

Sent from my iPhone

> On Aug 7, 2019, at 4:19 PM, Yoandy Rodríguez <[hidden email]> wrote:
>
> Hello everybody,
>
> We have the following situation:
>
> 1) A data stream which collects all system events (near 1/2 a mil per day).
>
> 2) A database storing some aggregation of the data.
>
> We want to split the data into different "time slices" and be able to
> "tag it" accordingly.
>
> Example:
>
> the events in the first hour will be tagged as such:
>
> Time of arrival (slice)        Tag
>
> 0:00:00 - 0:59:59               Last Hour
>
> 0:30:00 - 0:59:59              Last 1/2 Hour
>
> 0:50:00 - 0:59:59              Last 10 minutes
>
> Now, when we reach 1:09:59 the "last ten minutes" tags, moves to  that
> slice, and so do the other ones.
>
> Mi initial idea was to have multiple windows operating over the same
> stream, but in that case I would have
>
> to keep a longer window just to remove the tag for events after the 1
> hour period.  Is there any way to avoid this?
>
>
> PD.
>
> This is part of my first Flink project so alternative
> solutions/literature are very much welcome
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Can Flink help us solve the following use case

Biao Liu
Hi Yoandy,

Could you explain more of your requirements?
Why do you want to split data into "time slices"? Do you want to do some aggregations or just give each record a tag or tags?

Thanks,
Biao /'bɪ.aʊ/



On Thu, Aug 8, 2019 at 4:52 AM Sameer Wadkar <[hidden email]> wrote:
You could do this using custom triggers and evictors in Flink. That way you can control when the windows fire and what elements are fired with it. And lastly the custom evictors know when to remove elements from the window.

Yes Flink can support it.

Sent from my iPhone

> On Aug 7, 2019, at 4:19 PM, Yoandy Rodríguez <[hidden email]> wrote:
>
> Hello everybody,
>
> We have the following situation:
>
> 1) A data stream which collects all system events (near 1/2 a mil per day).
>
> 2) A database storing some aggregation of the data.
>
> We want to split the data into different "time slices" and be able to
> "tag it" accordingly.
>
> Example:
>
> the events in the first hour will be tagged as such:
>
> Time of arrival (slice)        Tag
>
> 0:00:00 - 0:59:59               Last Hour
>
> 0:30:00 - 0:59:59              Last 1/2 Hour
>
> 0:50:00 - 0:59:59              Last 10 minutes
>
> Now, when we reach 1:09:59 the "last ten minutes" tags, moves to  that
> slice, and so do the other ones.
>
> Mi initial idea was to have multiple windows operating over the same
> stream, but in that case I would have
>
> to keep a longer window just to remove the tag for events after the 1
> hour period.  Is there any way to avoid this?
>
>
> PD.
>
> This is part of my first Flink project so alternative
> solutions/literature are very much welcome
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Can Flink help us solve the following use case

Yoandy Rodríguez

Hello Biao,

There's a legacy component that expect this "time slices" and tags to be set on our operational data store.

Right now I would like to just have the tags set properly on each record, after some reading I came out with the idea of setting multiple sliding windows

but there's still an issue with the overlapping "time slices", some elements belong to more than one tag and in that case, the one

representing the shortest time span should be used.

On 07/08/2019 23:02, Biao Liu wrote:
Hi Yoandy,

Could you explain more of your requirements?
Why do you want to split data into "time slices"? Do you want to do some aggregations or just give each record a tag or tags?

Thanks,
Biao /'bɪ.aʊ/



On Thu, Aug 8, 2019 at 4:52 AM Sameer Wadkar <[hidden email]> wrote:
You could do this using custom triggers and evictors in Flink. That way you can control when the windows fire and what elements are fired with it. And lastly the custom evictors know when to remove elements from the window.

Yes Flink can support it.

Sent from my iPhone

> On Aug 7, 2019, at 4:19 PM, Yoandy Rodríguez <[hidden email]> wrote:
>
> Hello everybody,
>
> We have the following situation:
>
> 1) A data stream which collects all system events (near 1/2 a mil per day).
>
> 2) A database storing some aggregation of the data.
>
> We want to split the data into different "time slices" and be able to
> "tag it" accordingly.
>
> Example:
>
> the events in the first hour will be tagged as such:
>
> Time of arrival (slice)        Tag
>
> 0:00:00 - 0:59:59               Last Hour
>
> 0:30:00 - 0:59:59              Last 1/2 Hour
>
> 0:50:00 - 0:59:59              Last 10 minutes
>
> Now, when we reach 1:09:59 the "last ten minutes" tags, moves to  that
> slice, and so do the other ones.
>
> Mi initial idea was to have multiple windows operating over the same
> stream, but in that case I would have
>
> to keep a longer window just to remove the tag for events after the 1
> hour period.  Is there any way to avoid this?
>
>
> PD.
>
> This is part of my first Flink project so alternative
> solutions/literature are very much welcome
>
>