WindowedStream operation questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

WindowedStream operation questions

Elias Levy
An observation and a couple of question from a novice.

The observation: The Flink web site makes available ScalaDocs for org.apache.flink.api.scala but not for org.apache.flink.streaming.api.scala.

Now for the questions: 

Why can't you use map to transform a data stream, say convert all the elements to integers (e.g. .map { x => 1 }), then create a tumbling processing time window (e.g. .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(2))))?

Then the inverse: Why do AllWindiowedStream and WindowStream not have a map method?
Reply | Threaded
Open this post in threaded view
|

Re: WindowedStream operation questions

Aljoscha Krettek
Hi,
I saw the issue you opened. :-) I'll try and figure out how to get all the Scaladocs on there.

Regarding the other questions. A WindowedStream is basically not a Stream in itself but a stepping stone towards specifying a windowed operation that results in a new Stream. So the pattern always has to be like this:

in
  .keyBy(...)
  .window(...)
  .reduce() // or fold() or apply()

Only with the final specification of the operation do you get a new DataStream of elements. You can do a map() before the window operation but not in between specifying a window and an operation that works on windows.

This differs from the Apache Beam (formerly Google Dataflow) model where the window is a property of stream elements. There you can do:

in
  .map()
  .window()
  .map()
  .reduce()

And the reduce operation will work on the windows that where assigned in the window operation.

I hope this helps somewhat but please let me know if I should go into details.

Cheers,
Aljoscha

On Thu, 7 Apr 2016 at 17:22 Elias Levy <[hidden email]> wrote:
An observation and a couple of question from a novice.

The observation: The Flink web site makes available ScalaDocs for org.apache.flink.api.scala but not for org.apache.flink.streaming.api.scala.

Now for the questions: 

Why can't you use map to transform a data stream, say convert all the elements to integers (e.g. .map { x => 1 }), then create a tumbling processing time window (e.g. .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(2))))?

Then the inverse: Why do AllWindiowedStream and WindowStream not have a map method?