hi all, I have a stream of incoming object versions (objects change over time) and a requirement to fetch from a datastore the last known object version in order to link it with the id of the new version, so that I end up with a linked list of object versions. all object versions contain the same guid, so I was thinking about using flink streaming in order to assure ordering and avoid concurrency / race conditions in the linkage process (object version might arrive unordered or may arrive at spikes) if I use the object guid as a key for a keyed stream I am concerned I will end up with millions of windowed streams hence causing OOM. what do you think should be the right approach? do you think flink is the right technology for this task? |
This sounds like you have some per-key state to keep track of, so the 'correct' way to do it would be to keyBy the guid. I believe that if you run your environment using the Rocks DB state backend you will not OOM regardless of the number of GUIDs that are eventually tracked. Whether flink/stream processing is the most effective way to achieve your goal, I can't say, but I am fairly confident that this particular aspect is not a problem. On Sat, Apr 23, 2016 at 1:13 AM, Chen Bekor <[hidden email]> wrote:
|
cool - can you point me to some docs about how to configure Rocks DB? I searched the online docs and found nothing substantial. Also - If I'm using HDFS (S3backed ) cluster, how would that effect RocksDB? can I configure it to run on optimized SSD etc? any help is appreciated. On Sun, Apr 24, 2016 at 7:57 AM, John Sherwood <[hidden email]> wrote:
|
Hi, in the Flink doc there is this: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/state_backends.html#the-rocksdbstatebackend and this: RocksDBStateBackend Cheers, Aljoscha On Sun, 24 Apr 2016 at 21:58 Chen Bekor <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |