Hi Flink users,
I am trying to figure out how leverage parallelism to improve throughput of a Kafka consumer. From my research, I understand the scenario when kafka partitions (=<>) # consumer and to use rebalance spread messages evenly across workers.
Also use setParallelism(#) to achieve the similar effect as adding more bolts in Storm`s speak. In storm, there is an offsetManager to handle multiple outstanding offsets due to parallelism.
Does Flink also has mechanism to manage multiple offset when setParallelism is used and make sure the offset is committed 'in order'?
From my own experiments, looks like it has something to do with whether checkpointing is enabled and the interval of checkpoint if it is enabled.
when setParallelism is used, if one thread is stuck, how does Flink decide what is the number of uncommitted offset?
Thanks in advance.
Ben