Apache Flink - Question about rolling window function on KeyedStream

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - Question about rolling window function on KeyedStream

M Singh
Hi:

Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});

The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).

Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about rolling window function on KeyedStream

Stefan Richter
Hi,

I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.

Best,
Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <[hidden email]>:

Hi:

Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});

The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about rolling window function on KeyedStream

M Singh
Hi Stefan:

Thanks for your response.

A follow up question - In a streaming environment, we invoke the operation reduce and then output results to the sink. Does this mean reduce will be executed once on every trigger per partition with all the items in each partition ?

Thanks


On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <[hidden email]> wrote:


Hi,

I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.

Best,
Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <[hidden email]>:

Hi:

Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});

The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).

Thanks

Mans



Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about rolling window function on KeyedStream

Fabian Hueske-2
Hi,

the ReduceFunction holds the last emitted record as state. When a new record arrives, it reduces the new record and last emitted record, updates its state, and emits the new result.
Therefore, a ReduceFunction emits one output record for each input record, i.e., it is triggered for each input record. The output of the ReduceFunction should be treated as a stream of updates not of final results.

Best, Fabian

2018-01-03 18:46 GMT+01:00 M Singh <[hidden email]>:
Hi Stefan:

Thanks for your response.

A follow up question - In a streaming environment, we invoke the operation reduce and then output results to the sink. Does this mean reduce will be executed once on every trigger per partition with all the items in each partition ?

Thanks


On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <[hidden email]> wrote:


Hi,

I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.

Best,
Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <[hidden email]>:

Hi:

Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});

The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).

Thanks

Mans




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about rolling window function on KeyedStream

M Singh
Hi Fabian:

Thanks for your answer - it is starting to make sense to me now.


On Thursday, January 4, 2018 12:58 AM, Fabian Hueske <[hidden email]> wrote:


Hi,

the ReduceFunction holds the last emitted record as state. When a new record arrives, it reduces the new record and last emitted record, updates its state, and emits the new result.
Therefore, a ReduceFunction emits one output record for each input record, i.e., it is triggered for each input record. The output of the ReduceFunction should be treated as a stream of updates not of final results.

Best, Fabian

2018-01-03 18:46 GMT+01:00 M Singh <[hidden email]>:
Hi Stefan:

Thanks for your response.

A follow up question - In a streaming environment, we invoke the operation reduce and then output results to the sink. Does this mean reduce will be executed once on every trigger per partition with all the items in each partition ?

Thanks


On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <[hidden email]> wrote:


Hi,

I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.

Best,
Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <[hidden email]>:

Hi:

Apache Flink documentation (https://ci.apache.org/ projects/flink/flink-docs- release-1.4/dev/stream/ operators/index.html) indicates that a reduce function on a KeyedStream  as follows:

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:
keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});

The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).

Thanks

Mans