Re: Kinesis connector SHARD_GETRECORDS_MAX default value

Posted by Tzu-Li (Gordon) Tai on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-tp12332p12761.html

Thanks for filing the JIRA!

Would you also be up to open a PR to for the change? That would be very very helpful :)

Cheers,
Gordon

On 24 April 2017 at 3:27:48 AM, Steffen Hausmann ([hidden email]) wrote:

Hi Gordon,

thanks for looking into this and sorry it took me so long to file the
issue: https://issues.apache.org/jira/browse/FLINK-6365.

Really appreciate your contributions for the Kinesis connector!

Cheers,
Steffen

On 22/03/2017 20:21, Tzu-Li (Gordon) Tai wrote:

> Hi Steffan,
>
> I have to admit that I didn’t put too much thoughts in the default
> values for the Kinesis consumer.
>
> I’d say it would be reasonable to change the default values to follow
> KCL’s settings. Could you file a JIRA for this?
>
> In general, we might want to reconsider all the default values for
> configs related to the getRecords call, i.e.
> - SHARD_GETRECORDS_MAX
> - SHARD_GETRECORDS_INTERVAL_MILLIS
> - SHARD_GETRECORDS_BACKOFF_*
>
> Cheers,
> Gordon
>
> On March 23, 2017 at 2:12:32 AM, Steffen Hausmann
> ([hidden email] <mailto:[hidden email]>) wrote:
>
>> Hi there,
>>
>> I recently ran into problems with a Flink job running on an EMR cluster
>> consuming events from a Kinesis stream receiving roughly 15k
>> event/second. Although the EMR cluster was substantially scaled and CPU
>> utilization and system load were well below any alarming threshold, the
>> processing of events of the stream increasingly fell behind.
>>
>> Eventually, it turned out that the SHARD_GETRECORDS_MAX defaults to 100
>> which is apparently causing too much overhead when consuming events from
>> the stream. Increasing the value to 5000, a single GetRecords call to
>> Kinesis can retrieve up to 10k records, made the problem go away.
>>
>> I wonder why the default value for SHARD_GETRECORDS_MAX is chosen so low
>> (100x less than it could be). The Kinesis Client Library defaults to
>> 5000 and it's recommended to use this default value:
>> http://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html#consumer-app-reading-slower.
>>
>>
>> Thanks for the clarification!
>>
>> Cheers,
>> Steffen