operators

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

operators

Radu Tudoran

Hi,

 

Is there any way in which you can ensure that 2 distinct operators will be executed on the same machine?

More precisely what I am trying to do is to have a window that computes some metrics and will dump this locally (from the operator not from an output sink) and I would like to create independent of this (or event within the operator) a stream source to emit this data. I cannot

 

The schema would be something as below:

                  

Stream ->  operator   -> output

                    |                  

                  Local file      

                      |                  

                    Stream source -> new stream

                   

.=> the red items should go on the same machine

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: +49 15209084330

Telephone: +49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

 

Reply | Threaded
Open this post in threaded view
|

Re: operators

Stephan Ewen
Hi!

You cannot specify that on the higher API levels. The lower API levels have something called "CoLocationConstraint". At this point it is not exposed, because we thought that would lead to not very scalable and robust designs in many cases
.
The best thing usually is location transparency and local affinity (as a performance optimization).
Is the file large, i.e., would it hurt to do it on a DFS? Or actually use a Kafka Queue between the operators?

Stephan


On Wed, Mar 9, 2016 at 5:38 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

 

Is there any way in which you can ensure that 2 distinct operators will be executed on the same machine?

More precisely what I am trying to do is to have a window that computes some metrics and will dump this locally (from the operator not from an output sink) and I would like to create independent of this (or event within the operator) a stream source to emit this data. I cannot

 

The schema would be something as below:

                  

Stream ->  operator   -> output

                    |                  

                  Local file      

                      |                  

                    Stream source -> new stream

                   

.=> the red items should go on the same machine

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

 


Reply | Threaded
Open this post in threaded view
|

RE: operators

Radu Tudoran

Hi,

 

It would not be feasible actually to use kafka queues or the DFS. Could you point me at which level of API I could access the CoLocationConstraint? Is it accessible from the  DataSourceStream or from the operator directly?

 

I have also dig  through the documentation and API and I was curious to understand a bit what can the “slotSharingGroup” and “startNewResouceGroup()” can do.

 

I did not find though a good example..only this link https://issues.apache.org/jira/browse/FLINK-3315

 

Also, for the “slotSharingGroup” it doesn’t seem to be available (I am currently using flink 0.10) – so if it is something that came newer than I guess this is the explanation why I cannot find it in any of datastream api or source function

 

Thanks for the info.

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Wednesday, March 09, 2016 6:30 PM
To: [hidden email]
Subject: Re: operators

 

Hi!

 

You cannot specify that on the higher API levels. The lower API levels have something called "CoLocationConstraint". At this point it is not exposed, because we thought that would lead to not very scalable and robust designs in many cases

.

The best thing usually is location transparency and local affinity (as a performance optimization).

Is the file large, i.e., would it hurt to do it on a DFS? Or actually use a Kafka Queue between the operators?

 

Stephan

 

 

On Wed, Mar 9, 2016 at 5:38 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

 

Is there any way in which you can ensure that 2 distinct operators will be executed on the same machine?

More precisely what I am trying to do is to have a window that computes some metrics and will dump this locally (from the operator not from an output sink) and I would like to create independent of this (or event within the operator) a stream source to emit this data. I cannot

 

The schema would be something as below:

                  

Stream ->  operator   -> output

                    |                  

                  Local file      

                      |                  

                    Stream source -> new stream

                   

.=> the red items should go on the same machine

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" target="_blank">+49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

 

 

Reply | Threaded
Open this post in threaded view
|

Re: operators

Till Rohrmann

Hi Radu,

the API call slotSharingGroup was introduced with version 1.0. In the version 0.10 there was something similar called startNewResourceGroup, but it was somewhat broken. Therefore, I would recommend you upgrading to version 1.0. You can find the description of the new method here [1]. The slot sharing group basically allows you to define which operators can share a slot. This means that a subtask O_i of operator O and another subtask P_j of operator P will be executed in the same slot. However, you don’t have control over which subtask will be assigned to which slot. If this guarantee is strong enough for your use case, then you can simply upgrade and use slotSharingGroup to define your sharing groups. Btw: Per default all operators will be placed in the same slot sharing group.

If your requirement is that O_i will be executed in the same slot as P_i, then you have to add the corresponding JobVertices to a CoLocationGroup. At the moment this is not really exposed but you could try to get the JobGraph from the StreamGraph.getJobGraph and then use JobGraph.getVertices to get the JobVertices. Then you have to find out which JobVertices accommodate your operators. Once this is done, you can colocate them via the JobVertex.setStrictlyCoLocatedWith method. This might solve your problem, but I haven’t tested it myself.

[1] https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#task-chaining-and-resource-groups

Cheers,
Till


On Thu, Mar 10, 2016 at 2:52 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

 

It would not be feasible actually to use kafka queues or the DFS. Could you point me at which level of API I could access the CoLocationConstraint? Is it accessible from the  DataSourceStream or from the operator directly?

 

I have also dig  through the documentation and API and I was curious to understand a bit what can the “slotSharingGroup” and “startNewResouceGroup()” can do.

 

I did not find though a good example..only this link https://issues.apache.org/jira/browse/FLINK-3315

 

Also, for the “slotSharingGroup” it doesn’t seem to be available (I am currently using flink 0.10) – so if it is something that came newer than I guess this is the explanation why I cannot find it in any of datastream api or source function

 

Thanks for the info.

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Wednesday, March 09, 2016 6:30 PM
To: [hidden email]
Subject: Re: operators

 

Hi!

 

You cannot specify that on the higher API levels. The lower API levels have something called "CoLocationConstraint". At this point it is not exposed, because we thought that would lead to not very scalable and robust designs in many cases

.

The best thing usually is location transparency and local affinity (as a performance optimization).

Is the file large, i.e., would it hurt to do it on a DFS? Or actually use a Kafka Queue between the operators?

 

Stephan

 

 

On Wed, Mar 9, 2016 at 5:38 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

 

Is there any way in which you can ensure that 2 distinct operators will be executed on the same machine?

More precisely what I am trying to do is to have a window that computes some metrics and will dump this locally (from the operator not from an output sink) and I would like to create independent of this (or event within the operator) a stream source to emit this data. I cannot

 

The schema would be something as below:

                  

Stream ->  operator   -> output

                    |                  

                  Local file      

                      |                  

                    Stream source -> new stream

                   

.=> the red items should go on the same machine

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" target="_blank">+49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!