parallelism and slots allocated

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

parallelism and slots allocated

bwong247
When I start my flink application with a -p  parallelism value of 24, 29 slots are used for the application.  Is that expected behavior in some scenarios?

My application reads in an event stream from Kafka.   It does some filtering and does a keyBy on the stream.  Then it processes the same stream two different ways.  The first does some data extraction and writes to a sink (it uses rocksdb to manage state).  The second does a windowing on the stream and writes to a different sink.

Reply | Threaded
Open this post in threaded view
|

Re: parallelism and slots allocated

Kurt Young
Hi, 

Parallelism is actually operator level, and each instance of the operator will occupy one slot. In some cases, Flink use chaining to chain multi operators to let them share one single slot, but sometimes it can not be done. If your job contains multiple operators and some of them cannot be chained, it's possible that the job will use more slots than the number of parallelism you configured.

Best,
Kurt

On Thu, Feb 9, 2017 at 2:19 AM, bwong247 <[hidden email]> wrote:
When I start my flink application with a -p  parallelism value of 24, 29
slots are used for the application.  Is that expected behavior in some
scenarios?

My application reads in an event stream from Kafka.   It does some filtering
and does a keyBy on the stream.  Then it processes the same stream two
different ways.  The first does some data extraction and writes to a sink
(it uses rocksdb to manage state).  The second does a windowing on the
stream and writes to a different sink.





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/parallelism-and-slots-allocated-tp11534.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: parallelism and slots allocated

bwong247
Hi Kurt,

Thanks for the reply.

Does this mean that if my job has 3 operators (not chained), it will use at least 3 slots?  I thought parallelism was task based.  You can define it at an operator level, but that only  means that the tasks for that operator are distributed across that many slots.    Shouldn't I be able to start the 3 operator job with a parallelism of 1 where all the operators run on the same single slot?

Regards,
Bernard
Reply | Threaded
Open this post in threaded view
|

Re: parallelism and slots allocated

Kurt Young
Hi,

The first answer is "yes", 3 unchained operator will use at least 3 slots, except if these 3 operators are blocking operators and you are running a batch job, the operators will use the same slot one after another. 

Regarding to you second question, if you want to start 3 operators with parallelism of 1, Flink will chain these operators and execute them on the same slot. But if you disabled chaining, they do need 3 slots.
 


Best,
Kurt

On Sat, Feb 11, 2017 at 2:35 AM, bwong247 <[hidden email]> wrote:
Hi Kurt,

Thanks for the reply.

Does this mean that if my job has 3 operators (not chained), it will use at
least 3 slots?  I thought parallelism was task based.  You can define it at
an operator level, but that only  means that the tasks for that operator are
distributed across that many slots.    Shouldn't I be able to start the 3
operator job with a parallelism of 1 where all the operators run on the same
single slot?

Regards,
Bernard



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/parallelism-and-slots-allocated-tp11534p11578.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: parallelism and slots allocated

Greg Hogan

On Sat, Feb 11, 2017 at 3:25 AM, Kurt Young <[hidden email]> wrote:
Hi,

The first answer is "yes", 3 unchained operator will use at least 3 slots, except if these 3 operators are blocking operators and you are running a batch job, the operators will use the same slot one after another. 

Regarding to you second question, if you want to start 3 operators with parallelism of 1, Flink will chain these operators and execute them on the same slot. But if you disabled chaining, they do need 3 slots.
 


Best,
Kurt

On Sat, Feb 11, 2017 at 2:35 AM, bwong247 <[hidden email]> wrote:
Hi Kurt,

Thanks for the reply.

Does this mean that if my job has 3 operators (not chained), it will use at
least 3 slots?  I thought parallelism was task based.  You can define it at
an operator level, but that only  means that the tasks for that operator are
distributed across that many slots.    Shouldn't I be able to start the 3
operator job with a parallelism of 1 where all the operators run on the same
single slot?

Regards,
Bernard



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/parallelism-and-slots-allocated-tp11534p11578.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.