Can flink aggregate in local TM,then aggregate in global TM?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Can flink aggregate in local TM,then aggregate in global TM?

zhaifengwei
I have a cluster environment, I need aggregate dataStream on it.
I`m wonder whether I can aggregate in local server first, then aggregate in
global.
When I aggregate dataStream in global, the Network IO will increase fast.
I just want decrease the Network IO, So I need aggregate in local server
first.
How can I do it.

DataStream<String> dataIn....
dataIn.map().filter().assignTimestampsAndWatermarks().keyBy().window().Fold()



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Can flink aggregate in local TM,then aggregate in global TM?

Fabian Hueske-2
Hi,

(copying my answer from Stack Overflow)

The current release of Flink (Flink 1.4.0, Dec 2017) does not feature built-in support for pre-aggregations.
However, there are efforts on the way to add this for the next release (1.5.0), see FLINK-7561 [4]

You can implement a pre-aggregation operation based on a ProcessFunction [1]. The ProcessFunction could keep the pre-aggregates in a HashMap (of fixed size) in memory and register timers event-time and processing-time) to periodically emit the pre-aggregates. The state (i.e., content of the `HashMap`) should be persisted in managed operator state [2] to prevent data loss in case of a failure. When setting the timers, you need to respect the window boundaries.

Please note that FoldFunction has been deprecated and should be replaced by AggregateFunction [3].

Best, Fabian

2017-12-15 8:04 GMT+01:00 zhaifengwei <[hidden email]>:
I have a cluster environment, I need aggregate dataStream on it.
I`m wonder whether I can aggregate in local server first, then aggregate in
global.
When I aggregate dataStream in global, the Network IO will increase fast.
I just want decrease the Network IO, So I need aggregate in local server
first.
How can I do it.

DataStream<String> dataIn....
dataIn.map().filter().assignTimestampsAndWatermarks().keyBy().window().Fold()



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Can flink aggregate in local TM,then aggregate in global TM?

zhaifengwei
Hi Fabian Hueske-2,
    Thanks for your reply.This is exactly what I was looking for.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/