(DEPRECATED) Apache Flink User Mailing List archive.

User-defined aggregation function and parallelism

Classic

List

Threaded

2 messages Options

杨力

User-defined aggregation function and parallelism

I am running flink SQL in streaming mode and implemented a UDAGG, which is used in keyed HOP windows. But I found that the throughput decreases dramatically when the function is used. Does UDAGG run in parallell? Or does it run only in one thread?

Regards,

Bill

Fabian Hueske-2

Re: User-defined aggregation function and parallelism

Hi Bill,

Flink's built-in aggregation functions are implemented against the same interface as UDAGGs and are applied in parallel.

The performance depends of course on the implementation of the UDAGG. For example, you should try to keep the size of the accumulator as small as possible because it will be stored in the state backend.

If you are using the RocksDBStatebackend, this means that the accumulator is de/serialized for every records.

Best, Fabian

2018-04-16 5:21 GMT+02:00 杨力 <[hidden email]>:

I am running flink SQL in streaming mode and implemented a UDAGG, which is used in keyed HOP windows. But I found that the throughput decreases dramatically when the function is used. Does UDAGG run in parallell? Or does it run only in one thread?

Regards,
Bill