Efficiently splitting a stream 3 ways

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Efficiently splitting a stream 3 ways

ljwagerfield
Hi,

I'd like to know which is more efficient: splitting a stream 3 ways via `split` or via `filter`?

--- FILTER ------
val greater = stream.filter(_.n > 0)
val less = stream.filter(_.n < 0)
val equal = stream.filter(_.n == 0)
-----------------

- VS -

--- SPLIT -------
val split = stream.split(row =>
  if (row.n > 0)
  List("greater")
  else if (row.n < 0)
List("less")
  else
  List("equal")
)

val greater = split select "greater" 
val less = split select "less"
val equal = split select "equal"
-----------------

Thanks!
Lawrence
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently splitting a stream 3 ways

Aljoscha Krettek
I think the split/select variant should be a bit faster because it creates less object copies internally. It should also be more future proof because it will benefit from improvements (if any) in the way split/select works.

There is also some ongoing work in adding support for side outputs which allow outputting to several streams from one user function: https://github.com/apache/flink/pull/2982

Cheers,
Aljoscha



On Thu, 22 Dec 2016 at 16:34 Lawrence Wagerfield <[hidden email]> wrote:
Hi,

I'd like to know which is more efficient: splitting a stream 3 ways via `split` or via `filter`?

--- FILTER ------
val greater = stream.filter(_.n > 0)
val less = stream.filter(_.n < 0)
val equal = stream.filter(_.n == 0)
-----------------

- VS -

--- SPLIT -------
val split = stream.split(row =>
  if (row.n > 0)
  List("greater")
  else if (row.n < 0)
List("less")
  else
  List("equal")
)

val greater = split select "greater" 
val less = split select "less"
val equal = split select "equal"
-----------------

Thanks!
Lawrence
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently splitting a stream 3 ways

Chen Bekor

On Jan 9, 2017 3:41 PM, "Aljoscha Krettek" <[hidden email]> wrote:
I think the split/select variant should be a bit faster because it creates less object copies internally. It should also be more future proof because it will benefit from improvements (if any) in the way split/select works.

There is also some ongoing work in adding support for side outputs which allow outputting to several streams from one user function: https://github.com/apache/flink/pull/2982

Cheers,
Aljoscha



On Thu, 22 Dec 2016 at 16:34 Lawrence Wagerfield <[hidden email]> wrote:
Hi,

I'd like to know which is more efficient: splitting a stream 3 ways via `split` or via `filter`?

--- FILTER ------
val greater = stream.filter(_.n > 0)
val less = stream.filter(_.n < 0)
val equal = stream.filter(_.n == 0)
-----------------

- VS -

--- SPLIT -------
val split = stream.split(row =>
  if (row.n > 0)
  List("greater")
  else if (row.n < 0)
List("less")
  else
  List("equal")
)

val greater = split select "greater" 
val less = split select "less"
val equal = split select "equal"
-----------------

Thanks!
Lawrence