Flink 0.9.1 Kafka 0.8.1

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 0.9.1 Kafka 0.8.1

Gwenhael Pasquiers

Hi everyone,

 

We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :

 

My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)

 

Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …

 

Do you have an idea of what is wrong, or could it be an issue in flink ?

 

B.R.

 

Gwenhaël PASQUIERS

Reply | Threaded
Open this post in threaded view
|

Re: Flink 0.9.1 Kafka 0.8.1

rmetzger0
Hi Gwen,

sorry that you ran into this issue. The implementation of the Kafka Consumer has been changed completely in 0.9.1 because there were some corner-case issues with the exactly-once guarantees in 0.9.0.

I'll look into the issue immediately.


On Thu, Sep 10, 2015 at 4:26 PM, Gwenhael Pasquiers <[hidden email]> wrote:

Hi everyone,

 

We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :

 

My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)

 

Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …

 

Do you have an idea of what is wrong, or could it be an issue in flink ?

 

B.R.

 

Gwenhaël PASQUIERS


Reply | Threaded
Open this post in threaded view
|

RE: Flink 0.9.1 Kafka 0.8.1

Gwenhael Pasquiers

Thanks,

 

In the mean time we’ll go back to 0.9.0 J

 

From: Robert Metzger [mailto:[hidden email]]
Sent: jeudi 10 septembre 2015 16:49
To: [hidden email]
Subject: Re: Flink 0.9.1 Kafka 0.8.1

 

Hi Gwen,

 

sorry that you ran into this issue. The implementation of the Kafka Consumer has been changed completely in 0.9.1 because there were some corner-case issues with the exactly-once guarantees in 0.9.0.

 

I'll look into the issue immediately.

 

 

On Thu, Sep 10, 2015 at 4:26 PM, Gwenhael Pasquiers <[hidden email]> wrote:

Hi everyone,

 

We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :

 

My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)

 

Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …

 

Do you have an idea of what is wrong, or could it be an issue in flink ?

 

B.R.

 

Gwenhaël PASQUIERS

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink 0.9.1 Kafka 0.8.1

Ufuk Celebi
Thanks for reporting the issue. I think this warrants a 0.9.2 release after the fix is in.

– Ufuk

> On 10 Sep 2015, at 16:52, Gwenhael Pasquiers <[hidden email]> wrote:
>
> Thanks,
>  
> In the mean time we’ll go back to 0.9.0 J
>  
> From: Robert Metzger [mailto:[hidden email]]
> Sent: jeudi 10 septembre 2015 16:49
> To: [hidden email]
> Subject: Re: Flink 0.9.1 Kafka 0.8.1
>  
> Hi Gwen,
>  
> sorry that you ran into this issue. The implementation of the Kafka Consumer has been changed completely in 0.9.1 because there were some corner-case issues with the exactly-once guarantees in 0.9.0.
>  
> I'll look into the issue immediately.
>  
>  
> On Thu, Sep 10, 2015 at 4:26 PM, Gwenhael Pasquiers <[hidden email]> wrote:
> Hi everyone,
>  
> We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :
>  
> My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)
>  
> Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …
>  
> Do you have an idea of what is wrong, or could it be an issue in flink ?
>  
> B.R.
>  
> Gwenhaël PASQUIERS

Reply | Threaded
Open this post in threaded view
|

Re: Flink 0.9.1 Kafka 0.8.1

rmetzger0
Good news: I was able to reproduce the issue and there is already a fix: https://github.com/apache/flink/pull/1117


I'm wondering how the offset became out of range. Was this caused by Flink or did you change the offset in Zookeeper with an external tool?
I'm just asking to make sure that Flink is not committing wrong offsets into Zookeeper :)


On Thu, Sep 10, 2015 at 5:49 PM, Ufuk Celebi <[hidden email]> wrote:
Thanks for reporting the issue. I think this warrants a 0.9.2 release after the fix is in.

– Ufuk

> On 10 Sep 2015, at 16:52, Gwenhael Pasquiers <[hidden email]> wrote:
>
> Thanks,
>
> In the mean time we’ll go back to 0.9.0 J
>
> From: Robert Metzger [mailto:[hidden email]]
> Sent: jeudi 10 septembre 2015 16:49
> To: [hidden email]
> Subject: Re: Flink 0.9.1 Kafka 0.8.1
>
> Hi Gwen,
>
> sorry that you ran into this issue. The implementation of the Kafka Consumer has been changed completely in 0.9.1 because there were some corner-case issues with the exactly-once guarantees in 0.9.0.
>
> I'll look into the issue immediately.
>
>
> On Thu, Sep 10, 2015 at 4:26 PM, Gwenhael Pasquiers <[hidden email]> wrote:
> Hi everyone,
>
> We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :
>
> My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)
>
> Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …
>
> Do you have an idea of what is wrong, or could it be an issue in flink ?
>
> B.R.
>
> Gwenhaël PASQUIERS


Reply | Threaded
Open this post in threaded view
|

RE: Flink 0.9.1 Kafka 0.8.1

Gwenhael Pasquiers

Nice ! Just need the next maintenance release now J

 

It (probably) became out of range because the job did not run for a couple hours and the offset kafka expired, we handle a lot of data so the retention isn’t that big. It happens often.

 

B.R.

 

 

 

From: Robert Metzger [mailto:[hidden email]]
Sent: jeudi 10 septembre 2015 18:57
To: [hidden email]
Subject: Re: Flink 0.9.1 Kafka 0.8.1

 

Good news: I was able to reproduce the issue and there is already a fix: https://github.com/apache/flink/pull/1117

 

 

I'm wondering how the offset became out of range. Was this caused by Flink or did you change the offset in Zookeeper with an external tool?

I'm just asking to make sure that Flink is not committing wrong offsets into Zookeeper :)

 

 

On Thu, Sep 10, 2015 at 5:49 PM, Ufuk Celebi <[hidden email]> wrote:

Thanks for reporting the issue. I think this warrants a 0.9.2 release after the fix is in.

– Ufuk


> On 10 Sep 2015, at 16:52, Gwenhael Pasquiers <[hidden email]> wrote:
>
> Thanks,
>
> In the mean time we’ll go back to 0.9.0 J
>
> From: Robert Metzger [mailto:[hidden email]]
> Sent: jeudi 10 septembre 2015 16:49
> To: [hidden email]
> Subject: Re: Flink 0.9.1 Kafka 0.8.1
>
> Hi Gwen,
>
> sorry that you ran into this issue. The implementation of the Kafka Consumer has been changed completely in 0.9.1 because there were some corner-case issues with the exactly-once guarantees in 0.9.0.
>
> I'll look into the issue immediately.
>
>
> On Thu, Sep 10, 2015 at 4:26 PM, Gwenhael Pasquiers <[hidden email]> wrote:
> Hi everyone,
>
> We’re trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we’ve run into the following issue :
>
> My offset became OutOfRange however now when I start my job, it loops on the OutOfRangeException, no matter what the value of auto.offset.reset is… (earliest, latest, largest, smallest)
>
> Looks like it doesn’t fix the invalid offset and immediately goes into error… Then Flink restarts the job, and failes again … etc …
>
> Do you have an idea of what is wrong, or could it be an issue in flink ?
>
> B.R.
>
> Gwenhaël PASQUIERS