Hi everyone,
I'm writing streaming job which needs to query Cassandra for each event multiple times, around 20. I would like to use Async IO for that but not sure which option to choose: 1. Implement One AsyncFunction with 20 queries inside 2. Implement 20 AsyncFunctions, each with 1 query inside Taking into account that each event needs all queries. Reduce amount of queries for each record is not an option. In this case I would like to minimise processing time of event, even if throughput will suffer. Any advice or consideration is greatly appreciated. Thanks, Maxim. |
Hi Maxim,
If reducing latency is the goal, then option #1 seems better. Though you’d need additional logic inside of your AsyncFunction to run all 20 queries in parallel. I’d also consider a third option... Use a FlatMapFunction to create 20 copies of the event (assuming it’s not large), with an additional field indicating which query should be made. Follow that with a rebalance(), and a single AsyncFunction that makes the appropriate query for the event, based on this new field. Then make sure you’ve got sufficient parallelism for your AsyncFunction to handle this fan-out. This should let you run the queries for a single event in parallel. — Ken > On Apr 3, 2018, at 9:59 AM, Maxim Parkachov <[hidden email]> wrote: > > Hi everyone, > > I'm writing streaming job which needs to query Cassandra for each event multiple times, around 20. I would like to use Async IO for that but not sure which option to choose: > > 1. Implement One AsyncFunction with 20 queries inside > 2. Implement 20 AsyncFunctions, each with 1 query inside > > Taking into account that each event needs all queries. Reduce amount of queries for each record is not an option. > > In this case I would like to minimise processing time of event, even if throughput will suffer. Any advice or consideration is greatly appreciated. > > Thanks, > Maxim. > -------------------------------------------- http://about.me/kkrugler +1 530-210-6378 |
Hi Maxim, I think Ken's approach is a good idea. However, you would need to a add a stateful operator to join the results of the individual queries if that is needed. 2018-04-03 19:39 GMT+02:00 Ken Krugler <[hidden email]>: Hi Maxim, |
Free forum by Nabble | Edit this page |