Hi Jiawei,
I agree that the offset management mechanism uses the same code as Kinesis Stream Consumer and in theory should not lose exactly-once semantics. As Ying is alluding to, if your application is restarted and you have snapshotting disabled in AWS there is a chance that records can be lost between runs. However, if you have snapshotting enabled then the application should continue consuming records from the last processed sequence number.
I am happy to take a deeper look if you can provide more information/logs/code.
Thanks,
From: Ying Xu <[hidden email]>
Date: Monday, 14 September 2020 at 19:48
To: Andrey Zagrebin <[hidden email]>
Cc: Jiawei Wu <[hidden email]>, user <[hidden email]>
Subject: RE: [EXTERNAL] Flink DynamoDB stream connector losing records
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
Hi Jiawei:
Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost?
FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset management mechanism. In theory it shouldn't lose exactly-once semantics in the case of getting throttled. We haven't run it in any AWS kinesis analytics environment though.
Thanks.
On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <[hidden email]> wrote:
Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.
Did you observe any failover in Flink logs?
On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:
And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.
Is it possible that Flink will lose exactly-once semantics when throttled?
On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:
Hi Andrey,
Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....
Regards,
Jiawei
On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Jiawei,
Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.I will cc Ying Xu who might have a better idea about the DinamoDB source.
Best,
Andrey
On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,
I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.
After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?
Thanks,
Jiawei
Free forum by Nabble | Edit this page |