What is output from DataSet.print()?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

What is output from DataSet.print()?

Jon Yeargers
Topology snip:

datastream = some_stream.keyBy(keySelector).timeWindow(Time.seconds(60)).reduce(new some_KeyReduce());

If I have a KeySelector that's pretty 'loose' (IE lots of matches) the 'some_KeyReduce' function gets hit frequently and some set of values is printed out via 'datastream.print()'.

If I have a more stringent KeySelector the 'keyReduce' function never gets called but the 'datastream.print()' function still outputs numerous values.

So how are the KeySelector and the output of the datastream.print() related? Or are they?

Reply | Threaded
Open this post in threaded view
|

Re: What is output from DataSet.print()?

Stephan Ewen
Hi!

The print() output is usually partitioned in the same way as the previous operation.
Because your previous operation is the keyBy/window operator, it should be partitioned following the key selected by the key selector.

The Reduce() function gets only called if a window has at least two elements. If the window has only one element, that single element is the result of the window and gets printed.

Greetings,
Stephan


On Wed, Aug 3, 2016 at 2:30 AM, Jon Yeargers <[hidden email]> wrote:
Topology snip:

datastream = some_stream.keyBy(keySelector).timeWindow(Time.seconds(60)).reduce(new some_KeyReduce());

If I have a KeySelector that's pretty 'loose' (IE lots of matches) the 'some_KeyReduce' function gets hit frequently and some set of values is printed out via 'datastream.print()'.

If I have a more stringent KeySelector the 'keyReduce' function never gets called but the 'datastream.print()' function still outputs numerous values.

So how are the KeySelector and the output of the datastream.print() related? Or are they?