Questions about user doc.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions about user doc.

Vishwas Siravara
Hey guys,
In this document : https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html , there is a line in the beginning of the scheduling section which says that : "A pipeline consists of multiple successive tasks, such as the n-th parallel instance of a MapFunction together with the n-th parallel instance of a ReduceFunction. Note that Flink often executes successive tasks concurrently:" 

I am guessing this means that Flink executes successive tasks from different pipelines successively right ? 

I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help. 

Thanks,
Vishwas 
Reply | Threaded
Open this post in threaded view
|

Re: Questions about user doc.

Biao Liu
Hi Vishwas,

I am guessing this means that Flink executes successive tasks from different pipelines successively right ? 

As the document described, "Note that Flink often executes successive tasks concurrently: For Streaming programs, that happens in any case, but also for batch programs, it happens frequently.". So I think "successively" is not accurate, at least for streaming job.

I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help. 

1. The "Intermediate dataset" is a kind of logical concept described in JobGraph, while the "Intermediate result partition" is more like physical concept described in ExecutionGraph. The "Intermediate result partition" is a parallel version of "Intermediate dataset".
2. This document is under "Internals" part. It refers to some internal implementations. There might not be enough documents as you wish. There are some links of the critical concepts of this document. They link to Flink Github repository. Sometimes codes are the best document :)


Vishwas Siravara <[hidden email]> 于2019年7月17日周三 下午1:40写道:
Hey guys,
In this document : https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html , there is a line in the beginning of the scheduling section which says that : "A pipeline consists of multiple successive tasks, such as the n-th parallel instance of a MapFunction together with the n-th parallel instance of a ReduceFunction. Note that Flink often executes successive tasks concurrently:" 

I am guessing this means that Flink executes successive tasks from different pipelines successively right ? 

I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help. 

Thanks,
Vishwas