User-defined aggregation function and parallelism

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

User-defined aggregation function and parallelism

杨力
I am running flink SQL in streaming mode and implemented a UDAGG, which is used in keyed HOP windows. But I found that the throughput decreases dramatically when the function is used. Does UDAGG run in parallell? Or does it run only in one thread?

Regards,
Bill
Reply | Threaded
Open this post in threaded view
|

Re: User-defined aggregation function and parallelism

Fabian Hueske-2
Hi Bill,

Flink's built-in aggregation functions are implemented against the same interface as UDAGGs and are applied in parallel.
The performance depends of course on the implementation of the UDAGG. For example, you should try to keep the size of the accumulator as small as possible because it will be stored in the state backend.
If you are using the RocksDBStatebackend, this means that the accumulator is de/serialized for every records.

Best, Fabian

2018-04-16 5:21 GMT+02:00 杨力 <[hidden email]>:
I am running flink SQL in streaming mode and implemented a UDAGG, which is used in keyed HOP windows. But I found that the throughput decreases dramatically when the function is used. Does UDAGG run in parallell? Or does it run only in one thread?

Regards,
Bill