Clarifications on FLINK-KAFKA consumer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Clarifications on FLINK-KAFKA consumer

Rahul Raj
Hi,

I have just started working with FLINK and I am working on a project which involves reading KAFKA data and processing it. Following questions came to my mind:

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task slot? Basically I mean if its going to be a parallel operation or a non parallel operation?

2. If its a parallel operation, then do multiple task slots read data from single kafka partition or multiple kafka partition?

3. If data is read from multiple Kafka partition, then how duplication is avoided? Is it done from KAFKA or by FLINK?

Rahul Raj
Reply | Threaded
Open this post in threaded view
|

Re: Clarifications on FLINK-KAFKA consumer

Tzu-Li (Gordon) Tai
Hi Rahul!

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task slot? Basically I mean if its going to be a parallel operation or a non parallel operation?
Yes, the FlinkKafkaConsumer is a parallel consumer.

2. If its a parallel operation, then do multiple task slots read data from single kafka partition or multiple kafka partition?

Each single parallel instance of a FlinkKafkaConsumer source can subscribe to multiple Kafka partitions. Each Kafka partition is handled by exactly one FlinkKafkaConsumer parallel instance.

3. If data is read from multiple Kafka partition, then how duplication is avoided? Is it done from KAFKA or by FLINK?
Yes, the FlinkKafkaConsumer is a parallel consumer.I’m not sure exactly what you are referring to by “duplication” here. Do you mean duplication in the data itself in the Kafka topics, or duplicated consumption by Flink?
If it is the former: prior to Kafka 0.11, Kafka writes did not support transactions and therefore can only have at-least-once writes.
If you mean the latter: the FlinkKafkaConsumer achieves exactly-once guarantees when consuming from Kafka topics using Flink’s checkpointing mechanism. You can read about that here [1][2].

Hope the pointers help!

- Gordon

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#kafka-consumers-and-fault-tolerance

On 22 September 2017 at 10:46:55 AM, Rahul Raj ([hidden email]) wrote:

Hi,

I have just started working with FLINK and I am working on a project which involves reading KAFKA data and processing it. Following questions came to my mind:

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task slot? Basically I mean if its going to be a parallel operation or a non parallel operation?

2. If its a parallel operation, then do multiple task slots read data from single kafka partition or multiple kafka partition?

3. If data is read from multiple Kafka partition, then how duplication is avoided? Is it done from KAFKA or by FLINK?

Rahul Raj