I am using the dashboard to inspect my multi stage pipeline. I cannot seem to find a manual or other description for the dashboard aside from the quickstart section (https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html).
I would like to know how approximately my physical layout (in terms of task slots, tasks and task chaining) looks like. I am assuming that each block in the attached topology plan is an operation that will execute as multiple parallel tasks based on the assigned parallelism. Multilevel operator chaining has been applied in most cases. The plan then occupies 3 task slots, as dictated by the highest DOP. Is this correct? What i am also unsure about is that i have keyBy transformations in my code, yet they are not shown as transformations in the plan. The corresponding section on workers and slots (https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources) however explicitly shows keyBy in the layout. Regards Leon sample-topology-plan.png (153K) Download Attachment |
Hi Leon, yes, you're right. The plan visualization shows the actual tasks. Each task can contain one or more (if chained) operators. A task is split into sub-tasks which are executed on the TaskManagers. A TaskManager slot can accommodate one subtask of each task (if the task has not been assigned to a different slot sharing group). Thus (per default) the number of required slots is defined by the operator with the highest parallelism. If you click on the different tasks, then you see in the list view at the bottom of the page, where the individual sub-tasks have been deployed to. The keyBy API call is actually not realized as a distinct Flink operator at runtime. Instead, the keyBy transformation influences how a downstream operator is connected with the upstream operator (pointwise or all to all). Furthermore, the keyBy transformation sets a special StreamPartitioner (HashPartitioner) which is used by the StreamRecordWriters to select the output channels (receiving sub tasks) for the current stream record. That is the reason why you don't see the keyBy transformation in the plan visualization. Consequently, the illustration on our website is not totally consistent with the actual implementation. Cheers, Till On Wed, Jun 8, 2016 at 11:01 AM, <[hidden email]> wrote:
|
Hi Till,
thanks for the clarification. It all makes sense now. So the keyBy call is more a partitioning scheme and less of an operator, similar to Storm's field grouping, and Flink's other schemes such as forward and broadcast. The difference is that it produces KeyedStreams, which are a prerequisite for certain types of transformations. Regards Leon 8. Jun 2016 14:05 by [hidden email]:
|
Free forum by Nabble | Edit this page |