Hi All,
Let's say a topic in kafka has 5 partitions. If I spawn 10 Task Managers with 1 slot each and parallelism is 10 then how will records be read from the kafka topic if I use the FlinkKafkaConsumer to read. Will 5 TM's read and the rest be ideal in that case? Is over subscribing the number of TM's than the number of partitions in the Kafka topic guarantee high throughput? Regards, Taher Koitawala GS Lab Pune+91 8407979163 |
Hi,
Yes, in your case half of the Kafka source tasks wouldn’t read/process any records (you can check that in web UI). This shouldn’t harm you, unless your records will be redistributed after the source. For example: source.keyBy(..).process(new MyVeryHeavyOperator()).print() Should be fine, because `keyBy(…)` will redistribute records. However source.map(new MyVeryHeavyOperator()).print() Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To solve that, you might want to consider using dataStream.rebalance(); Piotrek
|
Thanks a lot for the explanation. That was exactly what I thought should happen. However, it is always good to a clear confirmation. Regards, Taher Koitawala GS Lab Pune+91 8407979163 On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski <[hidden email]> wrote:
|
No problem :)
Piotrek
|
Free forum by Nabble | Edit this page |