Loading Rules from compacted Kafka Topic - open() vs Connected Streams

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading Rules from compacted Kafka Topic - open() vs Connected Streams

vijayakumarpl
Hello All,
I can think of two options of implementing below requirement and request some guidance on choosing the option with pros and cons.


Requirements:
- A in memory rules cache to be loaded from log compacted kafka topic. This cache has to be loaded prior to arrival of events.
- Updates to the log compacted kafka topic has to be tracked to keep the in memory rule cache up to date

Additional properties of data:
- On Job start/restart, this rule cache is always loaded from earliest available offset in the log. - No kafka offset store and restore required.
- No checkpointing needed for the rule cache, as it is loaded afresh in event of crash and restore
- No eventTime semantics required as we always want the latest rules to be loaded to cache

Implementation Options:

1. Using a KafkaConsumer in open() doing a initial load, and continuously fetching rule updates and keeping the in memory cache up to date. This option is not using a DataStream for rules as we don't use any goodies of stream like state,checkpoint, event time etc.
2. Connected Stream approach. Using a KafkaConsumer in open() doing a initial load. Have a FlinkKafkaSource Stream connected with events. In this case have to take care of out of order updates to caches, since the rules updates are from open() and Rule DataStream.

--
Thanks,
-Vijay
Reply | Threaded
Open this post in threaded view
|

Re: Loading Rules from compacted Kafka Topic - open() vs Connected Streams

vijayakumarpl
What i was trying to achieve from above was similar to GlobalKTable in Kafka Streams.  https://cwiki.apache.org/confluence/display/KAFKA/KIP-99%3A+Add+Global+Tables+to+Kafka+Streams
Also current flink version i am using is 1.4

Are there any other suggestions/guidance to achieve GlobalKTable functionality in flink

Thanks.

On Thu, Jul 12, 2018 at 1:00 PM vijayakumar palaniappan <[hidden email]> wrote:
Hello All,
I can think of two options of implementing below requirement and request some guidance on choosing the option with pros and cons.


Requirements:
- A in memory rules cache to be loaded from log compacted kafka topic. This cache has to be loaded prior to arrival of events.
- Updates to the log compacted kafka topic has to be tracked to keep the in memory rule cache up to date

Additional properties of data:
- On Job start/restart, this rule cache is always loaded from earliest available offset in the log. - No kafka offset store and restore required.
- No checkpointing needed for the rule cache, as it is loaded afresh in event of crash and restore
- No eventTime semantics required as we always want the latest rules to be loaded to cache

Implementation Options:

1. Using a KafkaConsumer in open() doing a initial load, and continuously fetching rule updates and keeping the in memory cache up to date. This option is not using a DataStream for rules as we don't use any goodies of stream like state,checkpoint, event time etc.
2. Connected Stream approach. Using a KafkaConsumer in open() doing a initial load. Have a FlinkKafkaSource Stream connected with events. In this case have to take care of out of order updates to caches, since the rules updates are from open() and Rule DataStream.

--
Thanks,
-Vijay


--
Thanks,
-Vijay