Updating a Tumbling Window every second?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating a Tumbling Window every second?

Matt
Hello,

I have a rather simple problem with a difficult explanation...

I have 3 streams, one of objects of class A (stream A), one of class B (stream B) and one of class C (stream C). The elements of A are generated at a rate of about 3 times every second. Elements of type B encapsulates some key features of the stream A (like the number of elements of A in the window) during the last 30 seconds (tumbling window 30s). Finally, the elements of type C contains statistics (for simplicity let's say the average of elements processed by each element in B) of the last 3 elements in B and are produced on every new element of B (count window 3, 1).

Illustrative example, () and [] denotes windows:

... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
... (b4 [b3 b2) b1]
... [c2] [c1]

This works fine, except for a dashboard that depends on the elements of C to be updated, and 30s is way too big of a delay. I thought I could change the tumbling window for a sliding window of size 30s and a slide of 1s, but this doesn't work.

If I use a sliding window to create elements of B as mentioned, each count window would contain 3 elements of B, and I would get one element of C every second as intended, but those elements in B encapsulates almost the same elements of A. This results in stats that are wrong.

For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3 share most of the elements from stream A.

Question: is there any way to create a count window with the last 3 elements of B that would have gone into the same tumbling window, not with the last 3 consecutive elements?

I hope the problem is clear, don't hesitate to ask for further clarification!

Regards,
Matt

Reply | Threaded
Open this post in threaded view
|

Re: Updating a Tumbling Window every second?

Matt
I have reduced the problem to a simple image [1].

Those shown on the image are the streams I have, and the problem now is how to create a custom window assigner such that objects in B that don't share elements in A, are put together in the same window.

Why? Because in order to create elements in C (triangles), I have to process n independent elements of B (n=2 in the example).

Maybe there's a better or simpler way to do this. Any idea is appreciated!

Regards,
Matt


On Thu, Dec 15, 2016 at 3:22 AM, Matt <[hidden email]> wrote:
Hello,

I have a rather simple problem with a difficult explanation...

I have 3 streams, one of objects of class A (stream A), one of class B (stream B) and one of class C (stream C). The elements of A are generated at a rate of about 3 times every second. Elements of type B encapsulates some key features of the stream A (like the number of elements of A in the window) during the last 30 seconds (tumbling window 30s). Finally, the elements of type C contains statistics (for simplicity let's say the average of elements processed by each element in B) of the last 3 elements in B and are produced on every new element of B (count window 3, 1).

Illustrative example, () and [] denotes windows:

... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
... (b4 [b3 b2) b1]
... [c2] [c1]

This works fine, except for a dashboard that depends on the elements of C to be updated, and 30s is way too big of a delay. I thought I could change the tumbling window for a sliding window of size 30s and a slide of 1s, but this doesn't work.

If I use a sliding window to create elements of B as mentioned, each count window would contain 3 elements of B, and I would get one element of C every second as intended, but those elements in B encapsulates almost the same elements of A. This results in stats that are wrong.

For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3 share most of the elements from stream A.

Question: is there any way to create a count window with the last 3 elements of B that would have gone into the same tumbling window, not with the last 3 consecutive elements?

I hope the problem is clear, don't hesitate to ask for further clarification!

Regards,
Matt


Reply | Threaded
Open this post in threaded view
|

Re: Updating a Tumbling Window every second?

Fabian Hueske-2
Hi Matt,

the combination of a tumbling time window and a count window is one way to define a sliding window.
In your example of a 30 secs tumbling window and a (3,1) count window results in a time sliding window of 90 secs width and 30 secs slide.

You could define a time sliding window of 90 secs width and 1 secs slide on stream A to get a stream C with faster updates.
If you still need stream B with the 30 secs tumbling window, you can have both windows defined on stream A.

Hope this helps,
Fabian

2016-12-16 12:58 GMT+01:00 Matt <[hidden email]>:
I have reduced the problem to a simple image [1].

Those shown on the image are the streams I have, and the problem now is how to create a custom window assigner such that objects in B that don't share elements in A, are put together in the same window.

Why? Because in order to create elements in C (triangles), I have to process n independent elements of B (n=2 in the example).

Maybe there's a better or simpler way to do this. Any idea is appreciated!

Regards,
Matt


On Thu, Dec 15, 2016 at 3:22 AM, Matt <[hidden email]> wrote:
Hello,

I have a rather simple problem with a difficult explanation...

I have 3 streams, one of objects of class A (stream A), one of class B (stream B) and one of class C (stream C). The elements of A are generated at a rate of about 3 times every second. Elements of type B encapsulates some key features of the stream A (like the number of elements of A in the window) during the last 30 seconds (tumbling window 30s). Finally, the elements of type C contains statistics (for simplicity let's say the average of elements processed by each element in B) of the last 3 elements in B and are produced on every new element of B (count window 3, 1).

Illustrative example, () and [] denotes windows:

... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
... (b4 [b3 b2) b1]
... [c2] [c1]

This works fine, except for a dashboard that depends on the elements of C to be updated, and 30s is way too big of a delay. I thought I could change the tumbling window for a sliding window of size 30s and a slide of 1s, but this doesn't work.

If I use a sliding window to create elements of B as mentioned, each count window would contain 3 elements of B, and I would get one element of C every second as intended, but those elements in B encapsulates almost the same elements of A. This results in stats that are wrong.

For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3 share most of the elements from stream A.

Question: is there any way to create a count window with the last 3 elements of B that would have gone into the same tumbling window, not with the last 3 consecutive elements?

I hope the problem is clear, don't hesitate to ask for further clarification!

Regards,
Matt



Reply | Threaded
Open this post in threaded view
|

Re: Updating a Tumbling Window every second?

Matt
Fabian,

Thanks for your answer. Since elements in B are expensive to create, I wanted to reuse them. I understand I can plug two consumers into stream A, but in that case -if I'm not wrong- I would have to create repeated elements of B: one to save them into stream B and one to create C objects for stream C.

Anyway, I've already solved this problem a few days back.

Regards,
Matt

On Mon, Dec 19, 2016 at 5:57 AM, Fabian Hueske <[hidden email]> wrote:
Hi Matt,

the combination of a tumbling time window and a count window is one way to define a sliding window.
In your example of a 30 secs tumbling window and a (3,1) count window results in a time sliding window of 90 secs width and 30 secs slide.

You could define a time sliding window of 90 secs width and 1 secs slide on stream A to get a stream C with faster updates.
If you still need stream B with the 30 secs tumbling window, you can have both windows defined on stream A.

Hope this helps,
Fabian

2016-12-16 12:58 GMT+01:00 Matt <[hidden email]>:
I have reduced the problem to a simple image [1].

Those shown on the image are the streams I have, and the problem now is how to create a custom window assigner such that objects in B that don't share elements in A, are put together in the same window.

Why? Because in order to create elements in C (triangles), I have to process n independent elements of B (n=2 in the example).

Maybe there's a better or simpler way to do this. Any idea is appreciated!

Regards,
Matt


On Thu, Dec 15, 2016 at 3:22 AM, Matt <[hidden email]> wrote:
Hello,

I have a rather simple problem with a difficult explanation...

I have 3 streams, one of objects of class A (stream A), one of class B (stream B) and one of class C (stream C). The elements of A are generated at a rate of about 3 times every second. Elements of type B encapsulates some key features of the stream A (like the number of elements of A in the window) during the last 30 seconds (tumbling window 30s). Finally, the elements of type C contains statistics (for simplicity let's say the average of elements processed by each element in B) of the last 3 elements in B and are produced on every new element of B (count window 3, 1).

Illustrative example, () and [] denotes windows:

... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
... (b4 [b3 b2) b1]
... [c2] [c1]

This works fine, except for a dashboard that depends on the elements of C to be updated, and 30s is way too big of a delay. I thought I could change the tumbling window for a sliding window of size 30s and a slide of 1s, but this doesn't work.

If I use a sliding window to create elements of B as mentioned, each count window would contain 3 elements of B, and I would get one element of C every second as intended, but those elements in B encapsulates almost the same elements of A. This results in stats that are wrong.

For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3 share most of the elements from stream A.

Question: is there any way to create a count window with the last 3 elements of B that would have gone into the same tumbling window, not with the last 3 consecutive elements?

I hope the problem is clear, don't hesitate to ask for further clarification!

Regards,
Matt