(DEPRECATED) Apache Flink User Mailing List archive.

Re: Flink DynamoDB stream connector losing records

Posted by Ying Xu on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-DynamoDB-stream-connector-losing-records-tp38020p38096.html

Hi Jiawei:

Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost?

FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset management mechanism. In theory it shouldn't lose exactly-once semantics in the case of getting throttled. We haven't run it in any AWS kinesis analytics environment though.

Thanks.

On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <[hidden email]> wrote:

Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector.
Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <[hidden email]> wrote:
And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <[hidden email]> wrote:
Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8....

Regards,
Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <[hidden email]> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8?

Thanks,
Jiawei