Flink Execution Plan

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Execution Plan

lofifnc
Hi,

I'm trying to figure out what graph the execution plan represents when you call env.getExecutionPlan on the StreamExecutionEnvironment. From my understanding the StreamGraph is what you call an APIGraph, which will be used to create the JobGraph.
So is the ExecutionPlan is a full representation of the StreamGraph?
And Is there a way to get a human-interpretable representation of the JobGraph? :)

Best,
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Flink Execution Plan

Márton Balassi
Hey Alex,

Flink has 3 abstractions having a Graph suffix in place currently for streaming jobs:

  * StreamGraph: Used for representing the logical plan of a streaming job that is under construction in the API. This one is the only streaming specific in this list.
  * JobGraph: Used for representing the logical plan of a streaming job that is finished construction.
  * ExecutionGraph: The physical plan of the JobGraph, contains parallelism, estimated input sizes etc.

env.getExecutionPlan gives you a JSON String representation of the ExecutionGraph, which should contain must of the info you need. To visualize that go to your flink binary distribution and open up tools/planVisualizer.html in a browser, paste the JSON there and hit the button. :)

You might find it useful that the new Flink Dashboard also comes with this feature integrated, so you can visualize jobs that have been submitted to the cluster.

Hope that helps,

Marton

On Thu, Jan 14, 2016 at 11:56 AM, lofifnc <[hidden email]> wrote:
Hi,

I'm trying to figure out what graph the execution plan represents when you
call env.getExecutionPlan on the StreamExecutionEnvironment. From my
understanding the StreamGraph is what you call an APIGraph, which will be
used to create the JobGraph.
So is the ExecutionPlan is a full representation of the StreamGraph?
And Is there a way to get a human-interpretable representation of the
JobGraph? :)

Best,
Alex





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-tp4290.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Execution Plan

mnxfst
Hi

Is there a way to map a JSON representation back to an executable flink job? If there is no such API, what is the best starting point to implement such a feature?

Best
  Christian

2016-01-14 15:18 GMT+01:00 Márton Balassi <[hidden email]>:
Hey Alex,

Flink has 3 abstractions having a Graph suffix in place currently for streaming jobs:

  * StreamGraph: Used for representing the logical plan of a streaming job that is under construction in the API. This one is the only streaming specific in this list.
  * JobGraph: Used for representing the logical plan of a streaming job that is finished construction.
  * ExecutionGraph: The physical plan of the JobGraph, contains parallelism, estimated input sizes etc.

env.getExecutionPlan gives you a JSON String representation of the ExecutionGraph, which should contain must of the info you need. To visualize that go to your flink binary distribution and open up tools/planVisualizer.html in a browser, paste the JSON there and hit the button. :)

You might find it useful that the new Flink Dashboard also comes with this feature integrated, so you can visualize jobs that have been submitted to the cluster.

Hope that helps,

Marton

On Thu, Jan 14, 2016 at 11:56 AM, lofifnc <[hidden email]> wrote:
Hi,

I'm trying to figure out what graph the execution plan represents when you
call env.getExecutionPlan on the StreamExecutionEnvironment. From my
understanding the StreamGraph is what you call an APIGraph, which will be
used to create the JobGraph.
So is the ExecutionPlan is a full representation of the StreamGraph?
And Is there a way to get a human-interpretable representation of the
JobGraph? :)

Best,
Alex





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-tp4290.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Flink Execution Plan

lofifnc
In reply to this post by Márton Balassi
Hi Márton,

Thanks for your answer. But now I'm even more confused as it somehow conflicts with the documentation. ;)
According to the wiki and the stratosphere paper the JobGraph will be submitted to the JobManager. And the JobManager will then translate it into the ExecutionGraph.
In order to track the status of the parallel vertex and channel
instances individually, the Job Manager spans the Job Graph
to the Execution Graph, as shown in Fig. 9. The Execution
Graph contains a node for each parallel instance of a vertex,
which we refer to as a task.
So the ExecutionGraph should only be available at the JobManager and contain a node for each parallel instance of a operator and the corresponding vertices.

The question is in the context of my master thesis as I'm trying to describe the deployment process of Flink. And wan't to use a visualization of the execution plan as an concrete example for one of these three Graphs.

Best Alex!
 
Reply | Threaded
Open this post in threaded view
|

Re: Flink Execution Plan

Fabian Hueske-2
@Christian: I don't think that is possible.

There are quite a few things missing in the JSON including:
- User function objects (Flink ships objects not class names)
- Function configuration objects
- Data types

Best, Fabian

2016-01-14 16:02 GMT+01:00 lofifnc <[hidden email]>:
Hi Márton,

Thanks for your answer. But now I'm even more confused as it somehow
conflicts with the documentation. ;)
According to the wiki and the stratosphere paper the JobGraph will be
submitted to the JobManager. And the JobManager will then translate it into
the ExecutionGraph.

> In order to track the status of the parallel vertex and channel
> instances individually, the Job Manager spans the Job Graph
> to the Execution Graph, as shown in Fig. 9. The Execution
> Graph contains a node for each parallel instance of a vertex,
> which we refer to as a task.

So the ExecutionGraph should only be available at the JobManager and contain
a node for each parallel instance of a operator and the corresponding
vertices.

The question is in the context of my master thesis as I'm trying to describe
the deployment process of Flink. And wan't to use a visualization of the
execution plan as an concrete example for one of these three Graphs.

Best Alex!




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-tp4290p4297.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Execution Plan

Stephan Ewen
Actually, the thing with the JSON plans is slightly different now:

There are two types of plans:

1) The plan that describes the user program originally. That is what you get from env.getExecutionPlan().
In the Batch API, this has the result of the optimizer, in the streaming API the stream graph.

2) There is a JSON plan for the JobGraph / ExecutionGraph. This is what the web dashboard uses.
The main difference to the other JSON plan is that in the JobGraph, not all operators are visible any more. Chained operations currently look like one operator to the JobGraph.
Hence this JSON plan usually has fewer operators, and the names indicate that an operator is actually a chain of operations.

Greetings,
Stephan



On Thu, Jan 14, 2016 at 6:15 PM, Fabian Hueske <[hidden email]> wrote:
@Christian: I don't think that is possible.

There are quite a few things missing in the JSON including:
- User function objects (Flink ships objects not class names)
- Function configuration objects
- Data types

Best, Fabian

2016-01-14 16:02 GMT+01:00 lofifnc <[hidden email]>:
Hi Márton,

Thanks for your answer. But now I'm even more confused as it somehow
conflicts with the documentation. ;)
According to the wiki and the stratosphere paper the JobGraph will be
submitted to the JobManager. And the JobManager will then translate it into
the ExecutionGraph.

> In order to track the status of the parallel vertex and channel
> instances individually, the Job Manager spans the Job Graph
> to the Execution Graph, as shown in Fig. 9. The Execution
> Graph contains a node for each parallel instance of a vertex,
> which we refer to as a task.

So the ExecutionGraph should only be available at the JobManager and contain
a node for each parallel instance of a operator and the corresponding
vertices.

The question is in the context of my master thesis as I'm trying to describe
the deployment process of Flink. And wan't to use a visualization of the
execution plan as an concrete example for one of these three Graphs.

Best Alex!




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-tp4290p4297.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.