(DEPRECATED) Apache Flink User Mailing List archive.

回复：multiple consumer of intermediate data set

Classic

List

Threaded

3 messages Options

Zhijiang(wangzhijiang999)

Mar 14, 2017; 6:29am

回复：multiple consumer of intermediate data set

Hi ,

I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.

There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).

Cheers,

Zhijiang

------------------------------------------------------------------
发件人：윤형덕 <[hidden email]>
发送时间：2017年3月13日(星期一) 12:43
收件人：user <[hidden email]>
主　题：multiple consumer of intermediate data set

Hi All,

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)
it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?

... [show rest of quote]

lining jing

Mar 15, 2017; 2:48am

Re: multiple consumer of intermediate data set

Hi，

if output is same， why not just only one intermediate data set is ok

2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <[hidden email]>:

Hi ,

I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.
There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).

Cheers,

Zhijiang
------------------------------------------------------------------
发件人：윤형덕 <[hidden email]>
发送时间：2017年3月13日(星期一) 12:43
收件人：user <[hidden email]>
主　题：multiple consumer of intermediate data set

Hi All,

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)
it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?
... [show rest of quote]

... [show rest of quote]

Zhijiang(wangzhijiang999)

Mar 15, 2017; 3:30am

回复：multiple consumer of intermediate data set

In reply to this post by Zhijiang(wangzhijiang999)

Hi lining,

From JobGraph level, it is logic topology. There will be one IntermediateDataSet between each producer and consumer, like the case A-IntermediateDataSet-B, A-IntermediateDataSet-D in the left graph.

Also the same case for B-IntermediateDataSet-C, B-IntermediateDataSet-D, but the IntermediateDataSet between B and D is not shown separately in the left graph.

From ExecutionGraph level, it is related with physical runtime. There will be one IntermediateResultPartition among each connected parallel ExecutionVertex, like the case A1-IntermediateResultPartition-B1,A1-IntermediateResultPartition-B2,A2-IntermediateResultPartition-B1, A2-IntermediateResultPartition-B2 in the right graph.

Cheers,

Zhijiang

-----------------------------------------------------------------
发件人：lining jing <[hidden email]>
发送时间：2017年3月15日(星期三) 10:54
收件人：user <[hidden email]>; Zhijiang(wangzhijiang999) <[hidden email]>
主　题：Re: multiple consumer of intermediate data set

Hi，
if output is same， why not just only one intermediate data set is ok

2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <[hidden email]>:
Hi ,

I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.
There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).

Cheers,

Zhijiang
------------------------------------------------------------------
发件人：윤형덕 <[hidden email]>
发送时间：2017年3月13日(星期一) 12:43
收件人：user <[hidden email]>
主　题：multiple consumer of intermediate data set

Hi All,

figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg

as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)
it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?

... [show rest of quote]