Pointers about internal threads and communication in Flink (streaming)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Pointers about internal threads and communication in Flink (streaming)

Vincenzo Gulisano
Hi, is there any document describing how streaming operators are run by the TaskManagers and how communication (intra-node and inter-node) is managed. The closest documention I found is https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html but it is still pretty high-level.

Thank you for your help


Reply | Threaded
Open this post in threaded view
|

Re: Pointers about internal threads and communication in Flink (streaming)

Stephan Ewen
Hi!

We are working on more docs for that. Here is a start that has a section about the TaskManager task execution.

Until then, here is a bit from our wiki:






Some WIP documentation on the Task execution:


Greetings,
Stephan


On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <[hidden email]> wrote:
Hi, is there any document describing how streaming operators are run by the TaskManagers and how communication (intra-node and inter-node) is managed. The closest documention I found is https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html but it is still pretty high-level.

Thank you for your help



Reply | Threaded
Open this post in threaded view
|

Re: Pointers about internal threads and communication in Flink (streaming)

Aljoscha Krettek
Hi Vincenzo,
regarding TaskManagers and how they execute the operations:

The TaskManager gets a class that is derived from AbstractInvokable. The TaskManager will create an object from that class and then call methods to facilitate execution. The two main methods are registerInputOutput() and invoke(). The first allows the invokable to setup the input/output channels and do initialization work. Then, invoke is called which would contain the actual loop that keeps reading from inputs and forwards data to the operator implementation.

The base invokable for streaming is StreamTask. Then there are concrete subclasses OneInputStreamTask and TwoInputStreamTask for these two basic types of operator. The actual logic for an operator such as Map or Reduce is implemented in a subclass of StreamOperator (with concrete OneInputStreamOperator and TwoInputStreamOperator). OneInputStreamOperator, for example, has a method processElement(StreamRecord) that must be called for each element that is received.

The StreamOperator, in turn, would hold the user code function object and forward received elements to it.

To conclude, the StreamTask does the raw reading from network inputs. The StreamOperator receives elements and forwards them to user functions based on the semantics of the operator.

I hope this helps, let us know if you have any more questions about this. :D

Aljoscha

On Mon, 17 Aug 2015 at 16:08 Stephan Ewen <[hidden email]> wrote:
Hi!

We are working on more docs for that. Here is a start that has a section about the TaskManager task execution.

Until then, here is a bit from our wiki:






Some WIP documentation on the Task execution:


Greetings,
Stephan


On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <[hidden email]> wrote:
Hi, is there any document describing how streaming operators are run by the TaskManagers and how communication (intra-node and inter-node) is managed. The closest documention I found is https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html but it is still pretty high-level.

Thank you for your help



Reply | Threaded
Open this post in threaded view
|

Re: Pointers about internal threads and communication in Flink (streaming)

Vincenzo Gulisano
Thank you very much!
I will have a look at the docs

Vincenzo

On 17 August 2015 at 16:26, Aljoscha Krettek <[hidden email]> wrote:
Hi Vincenzo,
regarding TaskManagers and how they execute the operations:

The TaskManager gets a class that is derived from AbstractInvokable. The TaskManager will create an object from that class and then call methods to facilitate execution. The two main methods are registerInputOutput() and invoke(). The first allows the invokable to setup the input/output channels and do initialization work. Then, invoke is called which would contain the actual loop that keeps reading from inputs and forwards data to the operator implementation.

The base invokable for streaming is StreamTask. Then there are concrete subclasses OneInputStreamTask and TwoInputStreamTask for these two basic types of operator. The actual logic for an operator such as Map or Reduce is implemented in a subclass of StreamOperator (with concrete OneInputStreamOperator and TwoInputStreamOperator). OneInputStreamOperator, for example, has a method processElement(StreamRecord) that must be called for each element that is received.

The StreamOperator, in turn, would hold the user code function object and forward received elements to it.

To conclude, the StreamTask does the raw reading from network inputs. The StreamOperator receives elements and forwards them to user functions based on the semantics of the operator.

I hope this helps, let us know if you have any more questions about this. :D

Aljoscha

On Mon, 17 Aug 2015 at 16:08 Stephan Ewen <[hidden email]> wrote:
Hi!

We are working on more docs for that. Here is a start that has a section about the TaskManager task execution.

Until then, here is a bit from our wiki:






Some WIP documentation on the Task execution:


Greetings,
Stephan


On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <[hidden email]> wrote:
Hi, is there any document describing how streaming operators are run by the TaskManagers and how communication (intra-node and inter-node) is managed. The closest documention I found is https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html but it is still pretty high-level.

Thank you for your help