回复:multiple consumer of intermediate data set

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

回复:multiple consumer of intermediate data set

Zhijiang(wangzhijiang999)
Hi ,

     I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.
There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).


Cheers,

Zhijiang
------------------------------------------------------------------
发件人:윤형덕 <[hidden email]>
发送时间:2017年3月13日(星期一) 12:43
收件人:user <[hidden email]>
主 题:multiple consumer of intermediate data set

Hi All,

 

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

 

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )

and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)

it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)? 


Reply | Threaded
Open this post in threaded view
|

Re: multiple consumer of intermediate data set

lining jing
Hi,
   if output is same, why not just only one intermediate data set is ok

2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <[hidden email]>:
Hi ,

     I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.
There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).


Cheers,

Zhijiang
------------------------------------------------------------------
发件人:윤형덕 <[hidden email]>
发送时间:2017年3月13日(星期一) 12:43
收件人:user <[hidden email]>
主 题:multiple consumer of intermediate data set

Hi All,

 

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

 

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )

and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)

it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)? 



Reply | Threaded
Open this post in threaded view
|

回复:multiple consumer of intermediate data set

Zhijiang(wangzhijiang999)
In reply to this post by Zhijiang(wangzhijiang999)
Hi  lining,

        From JobGraph level, it is logic topology. There will be one IntermediateDataSet between each producer and consumer, like the case A-IntermediateDataSet-B,  A-IntermediateDataSet-D in the left graph.
Also the same case for  B-IntermediateDataSet-C,  B-IntermediateDataSet-D, but the IntermediateDataSet between B and D is not shown separately in the left graph.

       From ExecutionGraph level, it is related with physical runtime. There will be one IntermediateResultPartition among each connected parallel ExecutionVertex, like the case A1-IntermediateResultPartition-B1,A1-IntermediateResultPartition-B2,A2-IntermediateResultPartition-B1, A2-IntermediateResultPartition-B2 in the right graph.

Cheers,

Zhijiang
-----------------------------------------------------------------
发件人:lining jing <[hidden email]>
发送时间:2017年3月15日(星期三) 10:54
收件人:user <[hidden email]>; Zhijiang(wangzhijiang999) <[hidden email]>
主 题:Re: multiple consumer of intermediate data set

Hi,
   if output is same, why not just only one intermediate data set is ok

2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <[hidden email]>:
Hi ,

     I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.
There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).


Cheers,

Zhijiang
------------------------------------------------------------------
发件人:윤형덕 <[hidden email]>
发送时间:2017年3月13日(星期一) 12:43
收件人:user <[hidden email]>
主 题:multiple consumer of intermediate data set

Hi All,

 

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

 

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )

and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)

it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?