|
Hi. I am struggling the past few days to find a solution on the following problem, using Apache Flink:
I have a stream of vectors, represented by files in a local folder. After a new text file is located using DataStream<String> text = env.readFileStream(...), I transform (flatMap), the Input into a SingleOutputStreamOperator<Tuple2<String, Integer>, ?>, with the Integer being the score coming from a scoring function.
I want to persist a global HashMap containing the top-k vectors, using their scores. I approached the problem using a statefull transformation.
1. The first problem I have is that the HashMap retains per-sink data (so for each thread of workers, one HashMap of data). How can I make that a Global collection
2. Using Apache Spark, I made that possible by
JavaPairDStream<String, Integer> stateDstream = tuples.updateStateByKey(updateFunction);
and then making transformations on the stateDstream. Is there a way I can get the same functionality using FLink?
Thanks in advance!
|