Kinesis Shards and Parallelism

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Kinesis Shards and Parallelism

shkob1
Looking at the doc
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html
"When the number of shards is larger than the parallelism of the consumer,
then each consumer subtask can subscribe to multiple shards; otherwise if
the number of shards is smaller than the parallelism of the consumer, then
some consumer subtasks will simply be idle and wait until it gets assigned
new shards"

I have the *same number of shards as the configured parallelism*. Seems
though a task is grabbing multiple shards while others are idle. is that an
expected behavior?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Kinesis Shards and Parallelism

Tzu-Li (Gordon) Tai
Hi,

Another detail not that apparent in the description is that the assignment would only be evenly distributed assuming that the open Kinesis shards have consecutive shard ids, and are of the same Kinesis stream.
Once you reshard a Kinesis stream, it could be that the shard ids are no longer consecutive.

To overcome the skew in the distribution after several reshard operations, the Flink Kinesis Consumer allows providing a custom `KinesisShardAssigner`, which allows the user to decide how shards should be partitioned.
Please let me know if this helps!

Cheers,
Gordon

On Tue, Nov 13, 2018 at 2:28 AM shkob1 <[hidden email]> wrote:
Looking at the doc
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html
"When the number of shards is larger than the parallelism of the consumer,
then each consumer subtask can subscribe to multiple shards; otherwise if
the number of shards is smaller than the parallelism of the consumer, then
some consumer subtasks will simply be idle and wait until it gets assigned
new shards"

I have the *same number of shards as the configured parallelism*. Seems
though a task is grabbing multiple shards while others are idle. is that an
expected behavior?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Kinesis Shards and Parallelism

shkob1
Actually was looking at the task manager level, i did have more slots than
shards, so it does make sense i had an idle task manager while other task
managers split the shards between their slots

Thanks!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/