|
Hi Eric,
This sounds like a use case for BroadcastProcessFunction You’d use the Cassandra dataset as the source for the broadcast stream, which is distributed to every parallel instance of your custom BroadcastProcessFunction. The input vectors are a partitioned stream that’s the other input to this function (via its processElement() method). The two streams get connected as a BroadcastConnectedStream.
— Ken
Hi. I need to compute an euclidian distance between an input Vector and a full dataset stored in Cassandra and keep the n lowest value. The Cassandra dataset is evolving (mutable). I could do this on a batch job, but i will have to triger it each time and the input are more like a slow stream, but the computing need to be fast can i do this on a stream way? is there any better solution ? Thx
--------------------------
|