Question about Kafka Flink consumer parallelism

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Question about Kafka Flink consumer parallelism

xwang355
Hi Flink users,

I am trying to figure out how leverage parallelism to improve throughput of a Kafka consumer. From my research, I understand the scenario when kafka partitions (=<>) # consumer and  to use rebalance spread messages evenly across workers.

Also use setParallelism(#) to achieve the similar effect as adding more bolts in Storm`s speak. In storm, there is an offsetManager to handle  multiple outstanding offsets due to parallelism.

Does Flink also has mechanism to manage multiple offset  when setParallelism is used and make sure the offset is committed 'in order'? 

From my own experiments, looks like it has something to do with whether checkpointing is enabled and the interval of checkpoint if it is enabled.

when 
setParallelism is used, if one thread is stuck, how does Flink decide what is the  number of uncommitted offset?

Thanks in advance.
Ben