(DEPRECATED) Apache Flink User Mailing List archive.

Flink DynamoDB stream connector losing records

Classic

List

Threaded

8 messages Options

Jiawei Wu

Sep 10, 2020; 1:09pm

Flink DynamoDB stream connector losing records

Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,

Jiawei

Andrey Zagrebin-5

Sep 10, 2020; 2:13pm

Re: Flink DynamoDB stream connector losing records

Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.

I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,

Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:

Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

Jiawei Wu

Sep 10, 2020; 2:31pm

Re: Flink DynamoDB stream connector losing records

Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,

Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:

Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

... [show rest of quote]

Jiawei Wu

Sep 10, 2020; 2:33pm

Re: Flink DynamoDB stream connector losing records

And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:

Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,
Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

... [show rest of quote]

... [show rest of quote]

Andrey Zagrebin-5

Sep 10, 2020; 2:51pm

Re: Flink DynamoDB stream connector losing records

Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.

Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:

And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:
Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,
Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

Ying Xu

Sep 14, 2020; 6:47pm

Re: Flink DynamoDB stream connector losing records

Hi Jiawei:

Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost?

FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset management mechanism. In theory it shouldn't lose exactly-once semantics in the case of getting throttled. We haven't run it in any AWS kinesis analytics environment though.

Thanks.

On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <[hidden email]> wrote:

Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.
Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:
And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:
Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,
Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

Cranmer, Danny

Sep 15, 2020; 8:38am

Re: Flink DynamoDB stream connector losing records

Hi Jiawei,

I agree that the offset management mechanism uses the same code as Kinesis Stream Consumer and in theory should not lose exactly-once semantics. As Ying is alluding to, if your application is restarted and you have snapshotting disabled in AWS there is a chance that records can be lost between runs. However, if you have snapshotting enabled then the application should continue consuming records from the last processed sequence number.

I am happy to take a deeper look if you can provide more information/logs/code.

Thanks,

From: Ying Xu <[hidden email]>
Date: Monday, 14 September 2020 at 19:48
To: Andrey Zagrebin <[hidden email]>
Cc: Jiawei Wu <[hidden email]>, user <[hidden email]>
Subject: RE: [EXTERNAL] Flink DynamoDB stream connector losing records

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi Jiawei:

Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost?

Thanks.

On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <[hidden email]> wrote:

Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.

Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:

And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:

Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,

Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:

Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.

I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,

Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:

Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,

Jiawei

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

Jiawei Wu

Sep 22, 2020; 7:45am

Re: Flink DynamoDB stream connector losing records

Hi Ying and Danny,

Sorry for the late reply, I just got back from vacation.

Yes I'm running Flink in Kinesis Data Analytics with Flink 1.8, and checkpoint is enabled. This fully managed solution limits my access to Flink logs, so far I didn't get any logs related to throttle or fail over. The reason why I suspect throttle is the root cause is because some AWS lambda that connects to the same DynamoDB stream has higher throttle right after Flink starts consuming the DynamoDB stream, in this case I believe the throttle will also happen on Flink side. I'm actively working with AWS support to try to find some logs on this.

At the same time, when you say 'in theory should not lose exactly-once semantics', does that mean Flink will retry when throttle? I notice there is a parameter "flink.shard.getrecords.maxretries" and it's default value is 3. Will Flink skip this record when all retry attempts failed?

Thanks,

Jiawei

On Tue, Sep 15, 2020 at 4:38 PM Cranmer, Danny <[hidden email]> wrote:

Hi Jiawei,

I agree that the offset management mechanism uses the same code as Kinesis Stream Consumer and in theory should not lose exactly-once semantics. As Ying is alluding to, if your application is restarted and you have snapshotting disabled in AWS there is a chance that records can be lost between runs. However, if you have snapshotting enabled then the application should continue consuming records from the last processed sequence number.

I am happy to take a deeper look if you can provide more information/logs/code.

Thanks,

From: Ying Xu <[hidden email]>
Date: Monday, 14 September 2020 at 19:48
To: Andrey Zagrebin <[hidden email]>
Cc: Jiawei Wu <[hidden email]>, user <[hidden email]>
Subject: RE: [EXTERNAL] Flink DynamoDB stream connector losing records

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi Jiawei:

Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost?

FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset management mechanism. In theory it shouldn't lose exactly-once semantics in the case of getting throttled. We haven't run it in any AWS kinesis analytics environment though.

Thanks.

On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <[hidden email]> wrote:

Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.

Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:

And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:

Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,

Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:

Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.

I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,

Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:

Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,

Jiawei

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]