Reading from latest offset in kafka consumer on restart

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading from latest offset in kafka consumer on restart

Janardhan Reddy
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Stephan Ewen
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Janardhan Reddy
How would checkpointing affect the offset.

On Wed, Aug 3, 2016 at 3:03 PM, Stephan Ewen <[hidden email]> wrote:
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Janardhan Reddy
I mean in case of chekpointing, won't the consumer start from where it previously left ?

On Wed, Aug 3, 2016 at 3:06 PM, Janardhan Reddy <[hidden email]> wrote:
How would checkpointing affect the offset.

On Wed, Aug 3, 2016 at 3:03 PM, Stephan Ewen <[hidden email]> wrote:
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks



Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Stephan Ewen
Checkpointing starts the consumer where it left off in case the job fails and recovers.
If you explicitly cancel a job and start a new job (same jar), the new job will not start from a checkpoint, but from blank state.


On Wed, Aug 3, 2016 at 11:37 AM, Janardhan Reddy <[hidden email]> wrote:
I mean in case of chekpointing, won't the consumer start from where it previously left ?

On Wed, Aug 3, 2016 at 3:06 PM, Janardhan Reddy <[hidden email]> wrote:
How would checkpointing affect the offset.

On Wed, Aug 3, 2016 at 3:03 PM, Stephan Ewen <[hidden email]> wrote:
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks




Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Janardhan Reddy
thanks.
We are using kafka flink consumer 0.8.2_11 ,I have set "auto.offset.reset" to "largest"
On cancel and restart the consumer is reading from where it left off instead of current offset, i tried both largest and latest in auto.offset.reset



On Wed, Aug 3, 2016 at 3:12 PM, Stephan Ewen <[hidden email]> wrote:
Checkpointing starts the consumer where it left off in case the job fails and recovers.
If you explicitly cancel a job and start a new job (same jar), the new job will not start from a checkpoint, but from blank state.


On Wed, Aug 3, 2016 at 11:37 AM, Janardhan Reddy <[hidden email]> wrote:
I mean in case of chekpointing, won't the consumer start from where it previously left ?

On Wed, Aug 3, 2016 at 3:06 PM, Janardhan Reddy <[hidden email]> wrote:
How would checkpointing affect the offset.

On Wed, Aug 3, 2016 at 3:03 PM, Stephan Ewen <[hidden email]> wrote:
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks





Reply | Threaded
Open this post in threaded view
|

Re: Reading from latest offset in kafka consumer on restart

Stephan Ewen
Ah, you probably use the same consumer group ID.

Flink participates in Kafka's consumer groups (writing offsets for that group to ZooKeeper/Kafka). If you resume a job, it initially looks for the current offsets of that consumer group.
If you want to restart without such an offset, you need to set a random "group.id" in the properties of the FlinkKafkaConsumer.

We are thinking about changing the configuration a bit to make that more easy. In the next versions, it should be explicit if the FlinkKafkaConsumer would participate in the consumer group.

On Wed, Aug 3, 2016 at 11:48 AM, Janardhan Reddy <[hidden email]> wrote:
thanks.
We are using kafka flink consumer 0.8.2_11 ,I have set "auto.offset.reset" to "largest"
On cancel and restart the consumer is reading from where it left off instead of current offset, i tried both largest and latest in auto.offset.reset



On Wed, Aug 3, 2016 at 3:12 PM, Stephan Ewen <[hidden email]> wrote:
Checkpointing starts the consumer where it left off in case the job fails and recovers.
If you explicitly cancel a job and start a new job (same jar), the new job will not start from a checkpoint, but from blank state.


On Wed, Aug 3, 2016 at 11:37 AM, Janardhan Reddy <[hidden email]> wrote:
I mean in case of chekpointing, won't the consumer start from where it previously left ?

On Wed, Aug 3, 2016 at 3:06 PM, Janardhan Reddy <[hidden email]> wrote:
How would checkpointing affect the offset.

On Wed, Aug 3, 2016 at 3:03 PM, Stephan Ewen <[hidden email]> wrote:
When you cancel and restart a Flink job (without a savepoint), it does not use the checkpoint data, and uses the behavior you defined in the Kafka consumer to decide where to start from (consumer group, latest, or earliest).

On Wed, Aug 3, 2016 at 11:26 AM, Janardhan Reddy <[hidden email]> wrote:
Hi,

Is there a way to read from latest offset in kafka consumer on restart. 
Or can we somehow start flink ignoring previous checkpointed data.

Thanks