FlinkKinesisConsumer not getting data from Kinesis at a constant speed -lag of about 30-55 secs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

FlinkKinesisConsumer not getting data from Kinesis at a constant speed -lag of about 30-55 secs

Vijay Balakrishnan
Hi,
In using FlinkKinesisConsumer, I am seeing a lag of about 30-55 secs in fetching data from Kinesis after it has done 1 or 2 fetches even though data is getting put in the Kinesis data stream at a high clip.
I used ConsumerConfigConstants.SHARD_GETRECORDS_MAX of 10000 (tried with 5000, 200 etc) and ConsumerConfigConstants.SHARD_GETRECORDS_INTERVAL_MILLIS of 200ms(default is great here becaise of the 5 transaction limit per sec from AWS).Have also tried reducing the interval but I run into readThroughput Exception. How can I reduce this lag to make it pretty much real-time. I am also using Flink Processing time. Have gone from 1-3 shards for Kinesis Data stream. Is there some other tuning parm I need to add for FlinkKinesisConsumer or is it just that it doesn't have any data to pull from Kinesis.
I do 5 sec Tumbling time windows and use the window end timestamp to put into my InfluxDB timestamp column. I see that there is a constant 35 sec- 55 sec lag in the timestamps and that corresponds to the time lag I see in the logs where FlinkKinesisConsumer is waiting to fetch data from Kinesis.
I am seeing these log statements and not sure what to make of it to reduce the time lag of fetching data from Kinesis.
Logs:

23:23:40,286 [shardConsumers-Source: Custom Source -> (Map -> Sink: Unnamed, Filter) (2/8)-thread-0] DEBUG org.apache.flink.kinesis.shaded.com.amazonaws.requestId      [] - x-amzn-RequestId: f06409aa-d996-fb3f-a53c-5c066d509c9b
23:23:40,335 [Source: Custom Source -> (Map -> Sink: Unnamed, Filter) (2/8)] DEBUG org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher [] - Subtask 1 is trying to discover new shards that were created due to resharding ...


TIA,