(DEPRECATED) Apache Flink User Mailing List archive.

Questions about user doc.

Classic

List

Threaded

2 messages Options

Vishwas Siravara

Questions about user doc.

Hey guys,

In this document : https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html , there is a line in the beginning of the scheduling section which says that : "A pipeline consists of multiple successive tasks, such as the n-th parallel instance of a MapFunction together with the n-th parallel instance of a ReduceFunction. Note that Flink often executes successive tasks concurrently:"

I am guessing this means that Flink executes successive tasks from different pipelines successively right ?

I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help.

Thanks,

Vishwas

Biao Liu

Re: Questions about user doc.

Hi Vishwas,

> I am guessing this means that Flink executes successive tasks from different pipelines successively right ?

As the document described, "Note that Flink often executes successive tasks concurrently: For Streaming programs, that happens in any case, but also for batch programs, it happens frequently.". So I think "successively" is not accurate, at least for streaming job.

> I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help.

1. The "Intermediate dataset" is a kind of logical concept described in JobGraph, while the "Intermediate result partition" is more like physical concept described in ExecutionGraph. The "Intermediate result partition" is a parallel version of "Intermediate dataset".

2. This document is under "Internals" part. It refers to some internal implementations. There might not be enough documents as you wish. There are some links of the critical concepts of this document. They link to Flink Github repository. Sometimes codes are the best document :)

Vishwas Siravara <[hidden email]> 于2019年7月17日周三下午1:40写道：

Hey guys,
In this document : https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html , there is a line in the beginning of the scheduling section which says that : "A pipeline consists of multiple successive tasks, such as the n-th parallel instance of a MapFunction together with the n-th parallel instance of a ReduceFunction. Note that Flink often executes successive tasks concurrently:"

I am guessing this means that Flink executes successive tasks from different pipelines successively right ?

I also don't fully understand Intermediate result partition and Intermediate dataset , why are there two boxes in the diagram for intermediate result after the first execution job vertex ? Is there any more docs I can read to clearly understand these diagrams, thanks for your help.

Thanks,
Vishwas