Hi,
Flink will have to maintain state of the defined aggregations per each window and key (the more names you have, the bigger the state). Flinkās state backend will be used for that (for example memory or rocksdb).
However in most cases state will be small and not dependent on the length of the window, but only on number of keys. In your case per each key (name) only one counter will be maintained. Same applies to sums and averages (averages will use counter and sum).
There is no magic way to deal with too large state. Either add more RAM to the cluster, fallback to using disks or rewrite your query/application so it will not need that large state.
Piotrek
Hi All,
I have a small question regarding where does Flink stores data for doing window aggregations. Lets say I am running following query on Flink table:
SELECT name, count(*)
FROM testTable
GROUP BY TUMBLE(rowtime, INTERVAL '1' MINUTE), name
So, If I understand above query properly so it must be saving data for 1 minute somewhere to find aggregations. If Flink is persisting this in memory then my concern is if I increase interval to a DAY or more then it will store the complete data for interval which can cross memory. If persistence is disk then latency will be there.
Basically how do we solve such kind of use-cases using FLINK where aggregation interval are quite high.
Thanks in advance
--