Does Flink DataStreams using combiners?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Does Flink DataStreams using combiners?

Elias Levy
I am wondering if Flink makes use of combiners to pre-reduce a keyed and windowed stream before shuffling the data among workers.

I.e. will it use a combiner in something like:

stream.flatMap {...}
      .assignTimestampsAndWatermarks(...)
      .keyBy(...)
      .timeWindow(...)
      .trigger(...)
      .sum("cnt")

or will it shuffle the keyed input before the sum reduction?

If it does make use of combiners, it would be useful to point this out in the documentation, particularly if it only applies to certain types of reducers, folds, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Does Flink DataStreams using combiners?

Sameer Wadkar
Streaming cannot use windows. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.



> On Aug 11, 2016, at 8:51 PM, Elias Levy <[hidden email]> wrote:
>
> I am wondering if Flink makes use of combiners to pre-reduce a keyed and windowed stream before shuffling the data among workers.
>
> I.e. will it use a combiner in something like:
>
> stream.flatMap {...}
>       .assignTimestampsAndWatermarks(...)
>       .keyBy(...)
>       .timeWindow(...)
>       .trigger(...)
>       .sum("cnt")
>
> or will it shuffle the keyed input before the sum reduction?
>
> If it does make use of combiners, it would be useful to point this out in the documentation, particularly if it only applies to certain types of reducers, folds, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Does Flink DataStreams using combiners?

Sameer Wadkar
Sorry I mean streaming cannot use combiners (repeated below)
-------
Streaming cannot use combiners. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.

On Thu, Aug 11, 2016 at 9:22 PM, Sameer Wadkar <[hidden email]> wrote:
Streaming cannot use windows. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.



> On Aug 11, 2016, at 8:51 PM, Elias Levy <[hidden email]> wrote:
>
> I am wondering if Flink makes use of combiners to pre-reduce a keyed and windowed stream before shuffling the data among workers.
>
> I.e. will it use a combiner in something like:
>
> stream.flatMap {...}
>       .assignTimestampsAndWatermarks(...)
>       .keyBy(...)
>       .timeWindow(...)
>       .trigger(...)
>       .sum("cnt")
>
> or will it shuffle the keyed input before the sum reduction?
>
> If it does make use of combiners, it would be useful to point this out in the documentation, particularly if it only applies to certain types of reducers, folds, etc.

Reply | Threaded
Open this post in threaded view
|

Re: Does Flink DataStreams using combiners?

Aljoscha Krettek
Hi,
Sameer is right that Flink currently does not combine for any combination of assigner, trigger and window function.

Technically, it would be possible to use a combiner for Triggers that don't observe individual elements but only fire on time. With triggers that observe elements, such as CountTrigger it becomes impossible to figure out when to fire.

Cheers,
Aljoscha

On Fri, 12 Aug 2016 at 03:36 Sameer W <[hidden email]> wrote:
Sorry I mean streaming cannot use combiners (repeated below)
-------
Streaming cannot use combiners. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.

On Thu, Aug 11, 2016 at 9:22 PM, Sameer Wadkar <[hidden email]> wrote:
Streaming cannot use windows. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.




> On Aug 11, 2016, at 8:51 PM, Elias Levy <[hidden email]> wrote:
>
> I am wondering if Flink makes use of combiners to pre-reduce a keyed and windowed stream before shuffling the data among workers.
>
> I.e. will it use a combiner in something like:
>
> stream.flatMap {...}
>       .assignTimestampsAndWatermarks(...)
>       .keyBy(...)
>       .timeWindow(...)
>       .trigger(...)
>       .sum("cnt")
>
> or will it shuffle the keyed input before the sum reduction?
>
> If it does make use of combiners, it would be useful to point this out in the documentation, particularly if it only applies to certain types of reducers, folds, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Does Flink DataStreams using combiners?

Stephan Ewen
I think combiners would be a great addition to "aligned windows".

On Fri, Aug 12, 2016 at 11:11 AM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
Sameer is right that Flink currently does not combine for any combination of assigner, trigger and window function.

Technically, it would be possible to use a combiner for Triggers that don't observe individual elements but only fire on time. With triggers that observe elements, such as CountTrigger it becomes impossible to figure out when to fire.

Cheers,
Aljoscha

On Fri, 12 Aug 2016 at 03:36 Sameer W <[hidden email]> wrote:
Sorry I mean streaming cannot use combiners (repeated below)
-------
Streaming cannot use combiners. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.

On Thu, Aug 11, 2016 at 9:22 PM, Sameer Wadkar <[hidden email]> wrote:
Streaming cannot use windows. The aggregations happen on the trigger.

The elements being aggregated are only known after the trigger delivers the elements to the evaluation function.

Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum operator in your case, combiner cannot know what to pre aggregate even if were available.




> On Aug 11, 2016, at 8:51 PM, Elias Levy <[hidden email]> wrote:
>
> I am wondering if Flink makes use of combiners to pre-reduce a keyed and windowed stream before shuffling the data among workers.
>
> I.e. will it use a combiner in something like:
>
> stream.flatMap {...}
>       .assignTimestampsAndWatermarks(...)
>       .keyBy(...)
>       .timeWindow(...)
>       .trigger(...)
>       .sum("cnt")
>
> or will it shuffle the keyed input before the sum reduction?
>
> If it does make use of combiners, it would be useful to point this out in the documentation, particularly if it only applies to certain types of reducers, folds, etc.