What would the differences be between these scenarios? 1) one task manager with numberOfTaskSlots=1 and one job with parallelism=1 2) one task manager with numberOfTaskSlots=10 and one job with parallelism=10 In both cases all of the job's tasks get executed within the one task manager's jvm. Are there any downsides to doing #2 instead of #1? I ask this question because of current issues related to watermarks with Kafka sources [1] [2] and changing parallelism with savepoints [3]. I'm writing a Flink job that processes events from Kafka topics that have 12 partitions. I'm wondering if I should just set the job parallelism=12 and make numberOfTaskSlots sum to 12 across however many task managers I set up. It seems like watermarks would work properly then, and I could effectively change job parallelism using the number of task managers (e.g. 1 TM with slots=12, or 2 TMs with slots=6, or 12 TMs with slots=1, etc). Am I missing any important details that would make this a bad idea? It seems like a bit of abuse of numberOfTaskSlots, but also seems like a fairly simple solution to a few current issues. Thanks, Zach |
IMHO the only change for 2) is that you possibly get better machine utilization because it will use more parallel threads. So I think it’s a valid approach.
@Ufuk, could there be problems with the number of network buffers? I think not, because the connections are multiplexed in one channel, is this correct? I’ll also talk with the others so see if we can resolve the watermark/kafka partition issues before the 1.0 release. > On 20 Feb 2016, at 02:14, Zach Cox <[hidden email]> wrote: > > What would the differences be between these scenarios? > > 1) one task manager with numberOfTaskSlots=1 and one job with parallelism=1 > > 2) one task manager with numberOfTaskSlots=10 and one job with parallelism=10 > > In both cases all of the job's tasks get executed within the one task manager's jvm. Are there any downsides to doing #2 instead of #1? > > I ask this question because of current issues related to watermarks with Kafka sources [1] [2] and changing parallelism with savepoints [3]. I'm writing a Flink job that processes events from Kafka topics that have 12 partitions. I'm wondering if I should just set the job parallelism=12 and make numberOfTaskSlots sum to 12 across however many task managers I set up. It seems like watermarks would work properly then, and I could effectively change job parallelism using the number of task managers (e.g. 1 TM with slots=12, or 2 TMs with slots=6, or 12 TMs with slots=1, etc). > > Am I missing any important details that would make this a bad idea? It seems like a bit of abuse of numberOfTaskSlots, but also seems like a fairly simple solution to a few current issues. > > Thanks, > Zach > > [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tt4782.html > [2] https://issues.apache.org/jira/browse/FLINK-3375 > [3] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Changing-parallelism-tt4967.html > |
On Sat, Feb 20, 2016 at 10:12 AM, Aljoscha Krettek <[hidden email]> wrote:
> IMHO the only change for 2) is that you possibly get better machine utilization because it will use more parallel threads. So I think it’s a valid approach. > > @Ufuk, could there be problems with the number of network buffers? I think not, because the connections are multiplexed in one channel, is this correct? I would not expect it to become a problem. If it does, it's easy to resolve by throwing a little more memory at the problem. [1] – Ufuk [1] https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers |
Thanks for the input Aljoscha and Ufuk! I will try out the #2 approach and report back. Thanks, Zach On Sat, Feb 20, 2016 at 7:26 AM Ufuk Celebi <[hidden email]> wrote: On Sat, Feb 20, 2016 at 10:12 AM, Aljoscha Krettek <[hidden email]> wrote: |
Free forum by Nabble | Edit this page |