Flink job is not reading from certain kafka topic partitions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink job is not reading from certain kafka topic partitions

Vivek Yadav
We have a Flink stream job that uses FlinkKafkaConsumer010. We keep the
kafkaconsumer operator's parallelism same as the number of partition in
kafka topic so that each partition is attached to one subtask.
Quite frequently the job stops reading from certain partitions. On
investigating under job metrics tab(committed offsets & current offset
metrics), few partitions are not associated with any subtask while few
partitions are associated with more than 1 subtasks(which seems wrong
behavior). I have debugged it locally and seen that initially the consumer
starts fine with uniform association with partitions but over the time some
partitions disassociate from the subtask.

Here is our kafka consumer config
-checkpoint disabled.
-enable.auto.commit true
-auto.commit.interval.ms 5 minutes
-request.timeout.ms 3 minutes

Kafka Version: 0.10.1.0
Flink version: 1.3.1

Has someone else faced this issue? Any help or pointers would be much
appreciated.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink job is not reading from certain kafka topic partitions

Tzu-Li (Gordon) Tai
Hi,

The case you described looks a lot like this issue with the Flink Kafka Consumer in 1.3.0 / 1.3.1:
https://issues.apache.org/jira/browse/FLINK-7143

If this is the case, you would have to upgrade to 1.3.2 or above to overcome this.
The issue ticket description contains some things to keep in mind when upgrading from 1.3.0 / 1.3.1 to 1.3.2+, especially if your 1.3.0 / 1.3.1 savepoint was taken when the Kafka consumer had erroneous partition assignments.

Cheers,
Gordon


On Mon, Jul 30, 2018 at 6:23 PM vivekyadav01 <[hidden email]> wrote:
We have a Flink stream job that uses FlinkKafkaConsumer010. We keep the
kafkaconsumer operator's parallelism same as the number of partition in
kafka topic so that each partition is attached to one subtask.
Quite frequently the job stops reading from certain partitions. On
investigating under job metrics tab(committed offsets & current offset
metrics), few partitions are not associated with any subtask while few
partitions are associated with more than 1 subtasks(which seems wrong
behavior). I have debugged it locally and seen that initially the consumer
starts fine with uniform association with partitions but over the time some
partitions disassociate from the subtask.

Here is our kafka consumer config
-checkpoint disabled.
-enable.auto.commit true
-auto.commit.interval.ms 5 minutes
-request.timeout.ms 3 minutes

Kafka Version: 0.10.1.0
Flink version: 1.3.1

Has someone else faced this issue? Any help or pointers would be much
appreciated.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/