Flink TaskManager and JobManager internals

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink TaskManager and JobManager internals

Niclas Hedhman
Hi,

is there some document (or presentation) that explains the internals of how a Job gets deployed on to the cluster? Communications, Classloading and Serialization (if any) are the key points here I think.

I suspect that my application modeling framework is incompatible with the standard Flink mechanism, and I would like to learn how much effort there is to make my own mechanism (assuming it is possible, since Yarn and Mesos are in similar situation)


Thanks in Advance
--
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java
Reply | Threaded
Open this post in threaded view
|

Re: Flink TaskManager and JobManager internals

kedar mhaswade


On Wed, Mar 28, 2018 at 3:14 AM, Niclas Hedhman <[hidden email]> wrote:
Hi,

is there some document (or presentation) that explains the internals of how a Job gets deployed on to the cluster? Communications, Classloading and Serialization (if any) are the key points here I think.

I don't know of any specific presentations, but data artisans provide http://training.data-artisans.com/system-overview.html which are pretty good.
The Flink documentation is comprehensive.

I suspect that my application modeling framework is incompatible with the standard Flink mechanism, and I would like to learn how much effort there is to make my own mechanism (assuming it is possible, since Yarn and Mesos are in similar situation)

Don't know what you mean by application "modeling" framework, but if you mean that you have a Flink app (batch or streaming) that you'd want to deploy to YARN (or Mesos, which is similar), then the flow appears to be 
1- Create a "Flink Cluster" (also called a YARN session) when a user does "bin/yarn-session.sh <params>" and then 
2- Run the app when the user does "bin/flink run <app-class> <app-jar>". 

It's the user's responsibility to shut down the cluster (YARN session) by sending a "stop" command to the YARN session created in 1). The code appears to be in classes like org.apache.flink.yarn.cli.FlinkYarnSessionCli (manage the YARN session) and org.apache.flink.client.CliFrontend (submit a Flink app to the YARN session).

Regards,
Kedar



Thanks in Advance
--
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Reply | Threaded
Open this post in threaded view
|

Re: Flink TaskManager and JobManager internals

Niclas Hedhman

Thanks for trying to help, really appreciate it, and I am sorry that I was not clear enough.

I am using Apache Polygene, and its application modeling is very nice for what we do. What Polygene is exactly, is not really important, other than a lot of app code exist at my end and that Polygene generates classes on the fly, using custom classloaders. Think; AspectJ and similar, with runtime weaving.

These last few weeks with Flink has been a bit scary, since I think it is the first time in my 35 year career where I don't understand, can't figure out and can't find answers, to what is actually going on under the hood, even though I am able to work as a plain user, as prescribed, just fine. I can guess, but that is going to take longer time to work out, than getting pointers to those answers from the horses mouth.

What I don't fully understand in Flink (Streaming) is;

1. I define a main() method, put everything into a JAR file, and it "somehow" gets deployed on the nodes in my cluster. Will each node receive the JAR file and a JVM is spun up for that main(), or does Flink keep it in-JVM and some classloader isolation to protect jobs from each other? The dataArtisan presentation given, on slide 17 shows an ambiguous (to me) layout which could be interpreted as my Flink app (topology I prefer to call it) is executed on a seprate JVM...

2. But I have also seen that it is possible to "scale out" the processing within a topology, which would suggest that additional hosts are used. If so, how does that relate to the above deployment on, say 3 hosts? Is that scale-out only within that JVM 9in which case I am good and don't need to worry), or is that somehow offloaded to other servers in the cluster, and if so how is that deployed?

3. "Debugging Classloading" is IMVHO a little bit "short" on the details, and a complete overview of what classloaders Flink sets up (if any) and when/how it does it, is basically what I need to make sure I set all of that up correctly in my own case.

4. All Functions (et al) in Flink seems to require "java.io.Serializable", which to me is a big flag waved screaming "problem for me". Polygene has a state model that is not compatible with java.io.Serializable, and I have been looking for explanations on why the Functions are serializable, but since the data flow is dominating Flink Streaming there are LOTS of links talking about data serialization, which is not a problem on my end.

5. YARN/Mesos was only mentioned to point out that complex deployments are possible, with hooks, so from my PoV, worst-case scenario is to do my own deployment system that doesn't rely on some of the fundamentals in Flink. I am not to deploy on Mesos nor Yarn.


Once again, thanks a lot for any pointers or info that can be given.

Cheers
Niclas


On Wed, Mar 28, 2018 at 8:17 PM, kedar mhaswade <[hidden email]> wrote:


On Wed, Mar 28, 2018 at 3:14 AM, Niclas Hedhman <[hidden email]> wrote:
Hi,

is there some document (or presentation) that explains the internals of how a Job gets deployed on to the cluster? Communications, Classloading and Serialization (if any) are the key points here I think.

I don't know of any specific presentations, but data artisans provide http://training.data-artisans.com/system-overview.html which are pretty good.
The Flink documentation is comprehensive.

I suspect that my application modeling framework is incompatible with the standard Flink mechanism, and I would like to learn how much effort there is to make my own mechanism (assuming it is possible, since Yarn and Mesos are in similar situation)

Don't know what you mean by application "modeling" framework, but if you mean that you have a Flink app (batch or streaming) that you'd want to deploy to YARN (or Mesos, which is similar), then the flow appears to be 
1- Create a "Flink Cluster" (also called a YARN session) when a user does "bin/yarn-session.sh <params>" and then 
2- Run the app when the user does "bin/flink run <app-class> <app-jar>". 

It's the user's responsibility to shut down the cluster (YARN session) by sending a "stop" command to the YARN session created in 1). The code appears to be in classes like org.apache.flink.yarn.cli.FlinkYarnSessionCli (manage the YARN session) and org.apache.flink.client.CliFrontend (submit a Flink app to the YARN session).

Regards,
Kedar



Thanks in Advance
--
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java




--
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java