Get consumed Kafka offsets from Flink kafka source

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Get consumed Kafka offsets from Flink kafka source

bat man
Hi All,

Is there any way I can inspect/query the checkpointed data. Scenario is like this -

We have a high volume of data coming in the data stream pipeline for which kafka is source, in case if fails bcoz of bad data I want to analyse the data which caused the issue. It could be that some data source starts sending bad data so I want to go in kafka to that particular offset and do some analysis before I start the job with checkpointed data.

Can anyone suggest how this can be achieved.

Thanks,
Hemant


Reply | Threaded
Open this post in threaded view
|

Re: Get consumed Kafka offsets from Flink kafka source

Piotr Nowojski-4
Hi,

Depending how you configured your FlinkKafkaSource, but you can make the source to commit consumed offsets back to Kafka. So one way to examine them, would be to check those offsets in Kafka (I don't know how, but I'm pretty sure there is a way to do it).

Secondly, if you want to examine Flink's checkpoint state you can use State Processor API to do that [1]. As far as I know you could hook up your checkpointed data to Table API/SQL and use SQL to query/analyse the state.

Best 
Piotrek


śr., 14 kwi 2021 o 11:25 bat man <[hidden email]> napisał(a):
Hi All,

Is there any way I can inspect/query the checkpointed data. Scenario is like this -

We have a high volume of data coming in the data stream pipeline for which kafka is source, in case if fails bcoz of bad data I want to analyse the data which caused the issue. It could be that some data source starts sending bad data so I want to go in kafka to that particular offset and do some analysis before I start the job with checkpointed data.

Can anyone suggest how this can be achieved.

Thanks,
Hemant


Reply | Threaded
Open this post in threaded view
|

Re: Get consumed Kafka offsets from Flink kafka source

bat man
Thanks Piotrek for the references.

Cheers.
Hemant

On Wed, Apr 14, 2021 at 7:18 PM Piotr Nowojski <[hidden email]> wrote:
Hi,

Depending how you configured your FlinkKafkaSource, but you can make the source to commit consumed offsets back to Kafka. So one way to examine them, would be to check those offsets in Kafka (I don't know how, but I'm pretty sure there is a way to do it).

Secondly, if you want to examine Flink's checkpoint state you can use State Processor API to do that [1]. As far as I know you could hook up your checkpointed data to Table API/SQL and use SQL to query/analyse the state.

Best 
Piotrek


śr., 14 kwi 2021 o 11:25 bat man <[hidden email]> napisał(a):
Hi All,

Is there any way I can inspect/query the checkpointed data. Scenario is like this -

We have a high volume of data coming in the data stream pipeline for which kafka is source, in case if fails bcoz of bad data I want to analyse the data which caused the issue. It could be that some data source starts sending bad data so I want to go in kafka to that particular offset and do some analysis before I start the job with checkpointed data.

Can anyone suggest how this can be achieved.

Thanks,
Hemant