Hello,
I have to compute results on basis of lot of history data, parameters like total transactions in last 1 month, last 1 day, last 1 hour etc. by email id, ip, mobile, name, address, zipcode etc. So my question is this right approach to create keyed state by email, mobile, zipcode etc. or should i create 1 big mapped state (BS) and than process that BS, may be in process function or by applying some loop and filter logic in window or process function. My main worry is i will end up with millions of states, because there can be millions unique emails, phone numbers or zipcode if i create keyed state by email, phone etc. am i right ? is this impact on the performance or is this wrong approach ? Which approach would you suggest in this use case. Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things.... |
Each keyed state in Flink is a hashtable or a column family in RocksDB. Having too many of those is not memory efficient. Having fewer states is better, if you can adapt your schema that way. I would also look into "MapState", which is an efficient way to have "sub keys" under a keyed state. Stephan On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <[hidden email]> wrote:
|
Ok if i am taking it as right for an example : if i am creating a keyed state with name "total count by email" for key(project id + email) than it will create a single hash-table or column family "total count by email" and all the unique email id's will be rows of that single hash-table or column family and than i can store millions of unique email id's in that. Means it will create only single state object for all unique email id's ? On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <[hidden email]> wrote:
Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things.... |
If I am creating KeyedState ("count by email id") and keyed stream has 10 unique email id's. Will it create 1 column family or hash table ? Or it will create 10 column family or hash table ? Can i have millions of unique email id in that keyed state ? On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <[hidden email]> wrote:
Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things.... |
Hi,
If you have one keyed state, say "count by email id", and many different keys you will only have one column in RocksDB (or one HashTable). Actually, a lot of users have hundreds of millions of different keys for some states. Best, Aljoscha
|
Thanks Aljoscha and Stephan for clearing the doubt. On Wed, Aug 9, 2017 at 7:37 PM, Aljoscha Krettek <[hidden email]> wrote:
Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things.... |
Free forum by Nabble | Edit this page |