Cartesian product over windows

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Cartesian product over windows

Sonex
Hello everyone,

Given a stream of events (each event has a timestamp and a key), I want to create all possible combinations of the keys in a window (sliding, event time) and then process those combinations in parallel.

For example, if the stream contains events with keys 1,2,3,4 in a given window and the possible combinations are:

1-2
1-3
1-4
2-3
2-4
3-4

and if the parallelism is set to 2, I want to have events with these keys:

1-2    2-3
1-3    2-4
1-4    3-4

You can see that there is some replication. So when I use the apply method on a window it will have the keys separated like the example above.

Is there a way to do that?
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product over windows

Till Rohrmann
Hi Ioannis,

with a flatMap operation which replicates elements and assigning them a proper key followed by a keyBy operation you can practically generate all different kinds of partitionings.

So if you first collect the data in parallel windows, you can then replicate half of the data of each window for each other window (assigning the replicates for each other window a distinct key). Next you group on this key and calculate the cartesian product for each resulting group. This should give you a parallel cartesian product.

Cheers,
Till

On Thu, Feb 16, 2017 at 2:09 PM, Ioannis Kontopoulos <[hidden email]> wrote:
Hello everyone,

Given a stream of events (each event has a timestamp and a key), I want to create all possible combinations of the keys in a window (sliding, event time) and then process those combinations in parallel.

For example, if the stream contains events with keys 1,2,3,4 in a given window and the possible combinations are:

1-2
1-3
1-4
2-3
2-4
3-4

and if the parallelism is set to 2, I want to have events with these keys:

1-2    2-3
1-3    2-4
1-4    3-4

You can see that there is some replication. So when I use the apply method on a window it will have the keys separated like the example above.

Is there a way to do that?

Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product over windows

Sonex
Hi Till,

when you say parallel windows, what do you mean? Do you mean the use of timeWindowAll which has all the elements of a window in a single task?
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product over windows

rmetzger0
I think Till is referring to regular windows. The *All variants combine the data into one task.

On Fri, Feb 17, 2017 at 4:14 PM, Sonex <[hidden email]> wrote:
Hi Till,

when you say parallel windows, what do you mean? Do you mean the use of
timeWindowAll which has all the elements of a window in a single task?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cartesian-product-over-windows-tp11676p11716.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.