Custom Barrier?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom Barrier?

Paul Wilson
Hi,

I've been evaluating Flink and wondering if it was possible to define a window that is based on characteristics of the data (data driven) but not contained in the data stream directly. 

Consider 'nested events' where lower level events belong to a wider event where the wider event serves only to define a boundary (or window) over the lower level events. I was wondering if there was some way to communicate this super-structure in the stream somehow? 

I know that Flink users 'barriers' to define snapshot boundaries, but it might it be possible to communicate a 'window end' in a similar fashion? 

I guess I could attach an additional value to each event using a stateful map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul
Reply | Threaded
Open this post in threaded view
|

Re: Custom Barrier?

Aljoscha Krettek
Hi,
would these super-structure events occur per key? If yes, then I think you can process this using the currently available windowing mechanism by writing a custom WindowAssigner and Trigger. This, of course, assumes that the events arrive in-order, i.e. if A-End arrives before A-Start or if elements that should fall inside the A window arrive after A-End then I don't see an easy way to do it.

Let me know if you need to know more about assigners/triggers.

Cheers,
Aljoscha 

On Mon, 13 Jun 2016 at 16:29 Paul Wilson <[hidden email]> wrote:
Hi,

I've been evaluating Flink and wondering if it was possible to define a window that is based on characteristics of the data (data driven) but not contained in the data stream directly. 

Consider 'nested events' where lower level events belong to a wider event where the wider event serves only to define a boundary (or window) over the lower level events. I was wondering if there was some way to communicate this super-structure in the stream somehow? 

I know that Flink users 'barriers' to define snapshot boundaries, but it might it be possible to communicate a 'window end' in a similar fashion? 

I guess I could attach an additional value to each event using a stateful map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul
Reply | Threaded
Open this post in threaded view
|

Re: Custom Barrier?

Paul Wilson
Hi, 

No these super-structure events only serve the purpose of defining the boundaries of a join, and do not relate to the keys of the sub-events.

Thanks,
Paul

On 14 June 2016 at 10:32, Aljoscha Krettek <[hidden email]> wrote:
Hi,
would these super-structure events occur per key? If yes, then I think you can process this using the currently available windowing mechanism by writing a custom WindowAssigner and Trigger. This, of course, assumes that the events arrive in-order, i.e. if A-End arrives before A-Start or if elements that should fall inside the A window arrive after A-End then I don't see an easy way to do it.

Let me know if you need to know more about assigners/triggers.

Cheers,
Aljoscha 

On Mon, 13 Jun 2016 at 16:29 Paul Wilson <[hidden email]> wrote:
Hi,

I've been evaluating Flink and wondering if it was possible to define a window that is based on characteristics of the data (data driven) but not contained in the data stream directly. 

Consider 'nested events' where lower level events belong to a wider event where the wider event serves only to define a boundary (or window) over the lower level events. I was wondering if there was some way to communicate this super-structure in the stream somehow? 

I know that Flink users 'barriers' to define snapshot boundaries, but it might it be possible to communicate a 'window end' in a similar fashion? 

I guess I could attach an additional value to each event using a stateful map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul

Reply | Threaded
Open this post in threaded view
|

Re: Custom Barrier?

Paul Wilson

... and those events are in order

On 14 Jun 2016 14:04, "Paul Wilson" <[hidden email]> wrote:
Hi, 

No these super-structure events only serve the purpose of defining the boundaries of a join, and do not relate to the keys of the sub-events.

Thanks,
Paul

On 14 June 2016 at 10:32, Aljoscha Krettek <[hidden email]> wrote:
Hi,
would these super-structure events occur per key? If yes, then I think you can process this using the currently available windowing mechanism by writing a custom WindowAssigner and Trigger. This, of course, assumes that the events arrive in-order, i.e. if A-End arrives before A-Start or if elements that should fall inside the A window arrive after A-End then I don't see an easy way to do it.

Let me know if you need to know more about assigners/triggers.

Cheers,
Aljoscha 

On Mon, 13 Jun 2016 at 16:29 Paul Wilson <[hidden email]> wrote:
Hi,

I've been evaluating Flink and wondering if it was possible to define a window that is based on characteristics of the data (data driven) but not contained in the data stream directly. 

Consider 'nested events' where lower level events belong to a wider event where the wider event serves only to define a boundary (or window) over the lower level events. I was wondering if there was some way to communicate this super-structure in the stream somehow? 

I know that Flink users 'barriers' to define snapshot boundaries, but it might it be possible to communicate a 'window end' in a similar fashion? 

I guess I could attach an additional value to each event using a stateful map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul

Reply | Threaded
Open this post in threaded view
|

Re: Custom Barrier?

Aljoscha Krettek
Hi,
when you have a parallel input stream (for example multiple kafka partitions that you read from) would you have the super events (A-Start, B-Start and so on) in all of the parallel streams? If the answer is yes, then you can probably abuse the watermarks mechanism to deal with it. If not, then I'm afraid it's impossible to track process of these super events across parallel partitions.

Depending on your answer to the above we might be able to figure something out together.

Cheers,
Aljoscha

On Tue, 14 Jun 2016 at 16:19 Paul Wilson <[hidden email]> wrote:

... and those events are in order

On 14 Jun 2016 14:04, "Paul Wilson" <[hidden email]> wrote:
Hi, 

No these super-structure events only serve the purpose of defining the boundaries of a join, and do not relate to the keys of the sub-events.

Thanks,
Paul

On 14 June 2016 at 10:32, Aljoscha Krettek <[hidden email]> wrote:
Hi,
would these super-structure events occur per key? If yes, then I think you can process this using the currently available windowing mechanism by writing a custom WindowAssigner and Trigger. This, of course, assumes that the events arrive in-order, i.e. if A-End arrives before A-Start or if elements that should fall inside the A window arrive after A-End then I don't see an easy way to do it.

Let me know if you need to know more about assigners/triggers.

Cheers,
Aljoscha 

On Mon, 13 Jun 2016 at 16:29 Paul Wilson <[hidden email]> wrote:
Hi,

I've been evaluating Flink and wondering if it was possible to define a window that is based on characteristics of the data (data driven) but not contained in the data stream directly. 

Consider 'nested events' where lower level events belong to a wider event where the wider event serves only to define a boundary (or window) over the lower level events. I was wondering if there was some way to communicate this super-structure in the stream somehow? 

I know that Flink users 'barriers' to define snapshot boundaries, but it might it be possible to communicate a 'window end' in a similar fashion? 

I guess I could attach an additional value to each event using a stateful map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul