How to rewind Kafka cursors into a Flink job ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to rewind Kafka cursors into a Flink job ?

Dominique De Vito
Hi,

Is there a way to rewind Kafka cursors (while using here Kafka as a consumer) from (inside) a Flink job ?

Use case [plan A]
* The Flink job would listen 1 main "data" topic + 1 secondary "event" topic
* In case of a given event, the Flink job would rewind all Kafka cursors of the "data" topic, to go back to the latest cursors and retreat data from there.

Use case-bis [plan A-bis] :
* The Flink job would listen 1 main "data" topic, dealing with data according to some params
* This Flink job would listen a WS and in case of a given event, the Flink job would rewind all Kafka cursors of the "data" topic, to go back from the latest cursors and retreat data from there, according to some new params.

Plan B ;-)
* Listen the events from outside Flink, and in case of an event, stop the Flink and relaunch it.

So, if someone has any hint about how to rewind for [plan A] and/or [plan A-bis] => thank you !
 
Regards,
Dominique

Reply | Threaded
Open this post in threaded view
|

Re: How to rewind Kafka cursors into a Flink job ?

Tzu-Li (Gordon) Tai
Hi Dominique,

What your plan A is suggesting is that a downstream operator can provide signals to upstream operators and alter their behaviour.
In general, this isn’t possible, as in a distributed streaming environment it’s hard to guarantee what records exactly will be altered by the behaviour.

I would say plan B would be the approach to go for this.
Also, in Flink 1.3.0, the FlinkKafkaConsumer will allow users to define if they want to start from the earliest, latest, or some specific offset, completely independent of the committed consumer group offsets in Kafka.
This should also come in handy for what you have in mind. Have a look at https://issues.apache.org/jira/browse/FLINK-4280 for more details on this :)

Cheers,
Gordon

On March 28, 2017 at 12:35:38 AM, Dominique De Vito ([hidden email]) wrote:

Hi,

Is there a way to rewind Kafka cursors (while using here Kafka as a consumer) from (inside) a Flink job ?

Use case [plan A]
* The Flink job would listen 1 main "data" topic + 1 secondary "event" topic
* In case of a given event, the Flink job would rewind all Kafka cursors of the "data" topic, to go back to the latest cursors and retreat data from there.

Use case-bis [plan A-bis] :
* The Flink job would listen 1 main "data" topic, dealing with data according to some params
* This Flink job would listen a WS and in case of a given event, the Flink job would rewind all Kafka cursors of the "data" topic, to go back from the latest cursors and retreat data from there, according to some new params.

Plan B ;-)
* Listen the events from outside Flink, and in case of an event, stop the Flink and relaunch it.

So, if someone has any hint about how to rewind for [plan A] and/or [plan A-bis] => thank you !
 
Regards,
Dominique