Load distribution through the cluster

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Load distribution through the cluster

AndreaKinn
This post was updated on .
Hi,
I'm experimenting a bit with the cluster.
I didn't set any options about sharing slots and chains hoping that Flink
decided autonomously how to balance the load through the nodes of the
cluster. My cluster is composed by one job and task manager and two task
manager.

I noted that every time I start the program, just one node is busy (at > 95%
for each cpu core) while the other nodes are completely free (< 3%). Same
arguments for the memory.

So Flink doesn't balance the work on the nodes??
I expected something like the cpu usage was distributed on every nodes.

p.s: moreover this leads to a situation where data are not elaborated in real time... probably because the single working node is so stressful and can't do it.


--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Load distribution through the cluster

Fabian Hueske-2
Hi,

Flink's scheduling aims to co-located tasks to reduce network communication and ease the reasoning about resource/slot consumption.
A slot can execute one subtask of each operator of a program, i.e, a parallel slice of the program.

You can control the scheduling of tasks by specifying resource groups. [1] [2]

Best, Fabian

2017-09-18 15:22 GMT+02:00 AndreaKinn <[hidden email]>:
Hi,
I'm experimenting a bit with the cluster.
I didn't set any options about sharing slots and chains hoping that Flink
decided autonomously how to balance the load through the nodes of the
cluster. My cluster is composed by one job and task manager and two task
manager.

I noted that every time I start the program, just one node is busy (at > 95%
for each cpu core) while the other nodes are completely free (< 3%). Same
arguments for the memory.

So Flink doesn't balance the work on the nodes??
I expected something like the cpu usage was distributed on every nodes.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Load distribution through the cluster

AndreaKinn
So Flink use the other nodes just if one is completely "full" ?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Load distribution through the cluster

Fabian Hueske-2
There is no notion of "full" in Flink except that one slot will run at most one subtask of each operator.

The scheduling depends on the structure of the job, the parallelism of the operators, and the number of slots per TM.
It's hard to tell without knowing the details.

2017-09-19 11:57 GMT+02:00 AndreaKinn <[hidden email]>:
So Flink use the other nodes just if one is completely "full" ?

Reply | Threaded
Open this post in threaded view
|

Re: Load distribution through the cluster

AndreaKinn
If I apply a sharing slot as in the example:

DataStream<Event> LTzAccStream = env
                                .addSource(new FlinkKafkaConsumer010<>("topic", new
CustomDeserializer(), properties))
                                .assignTimestampsAndWatermarks(new CustomTimestampExtractor())
                                .map(new MapFunction<Tuple2&lt;String, String>, Event>(){
                                      @Override
                                        public Event map(Tuple2<String, String> value) throws Exception {
                                                return new Event(value.f0, value.f1);
                                        }
                                }).slotSharingGroup("group1");

just the map operator is assigned to the shared slot or it happens for the
entire block (addSource + assignTimestamp + map)?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Load distribution through the cluster

Chesnay Schepler
It should only apply to the map operator.

On 19.09.2017 17:38, AndreaKinn wrote:

> If I apply a sharing slot as in the example:
>
> DataStream<Event> LTzAccStream = env
> .addSource(new FlinkKafkaConsumer010<>("topic", new
> CustomDeserializer(), properties))
> .assignTimestampsAndWatermarks(new CustomTimestampExtractor())
> .map(new MapFunction<Tuple2&lt;String, String>, Event>(){
>                                        @Override
> public Event map(Tuple2<String, String> value) throws Exception {
> return new Event(value.f0, value.f1);
> }
> }).slotSharingGroup("group1");
>
> just the map operator is assigned to the shared slot or it happens for the
> entire block (addSource + assignTimestamp + map)?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>