Hi, is there some document (or presentation) that explains the internals of how a Job gets deployed on to the cluster? Communications, Classloading and Serialization (if any) are the key points here I think. |
On Wed, Mar 28, 2018 at 3:14 AM, Niclas Hedhman <[hidden email]> wrote:
I don't know of any specific presentations, but data artisans provide http://training.data-artisans.com/system-overview.html which are pretty good. The Flink documentation is comprehensive. Class-loading: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_classloading.html State serialization: https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/custom_serialization.html
Don't know what you mean by application "modeling" framework, but if you mean that you have a Flink app (batch or streaming) that you'd want to deploy to YARN (or Mesos, which is similar), then the flow appears to be 1- Create a "Flink Cluster" (also called a YARN session) when a user does "bin/yarn-session.sh <params>" and then 2- Run the app when the user does "bin/flink run <app-class> <app-jar>". It's the user's responsibility to shut down the cluster (YARN session) by sending a "stop" command to the YARN session created in 1). The code appears to be in classes like org.apache.flink.yarn.cli.FlinkYarnSessionCli (manage the YARN session) and org.apache.flink.client.CliFrontend (submit a Flink app to the YARN session). Regards, Kedar |
I am using Apache Polygene, and its application modeling is very nice for what we do. What Polygene is exactly, is not really important, other than a lot of app code exist at my end and that Polygene generates classes on the fly, using custom classloaders. Think; AspectJ and similar, with runtime weaving. These last few weeks with Flink has been a bit scary, since I think it is the first time in my 35 year career where I don't understand, can't figure out and can't find answers, to what is actually going on under the hood, even though I am able to work as a plain user, as prescribed, just fine. I can guess, but that is going to take longer time to work out, than getting pointers to those answers from the horses mouth. What I don't fully understand in Flink (Streaming) is;2. But I have also seen that it is possible to "scale out" the processing within a topology, which would suggest that additional hosts are used. If so, how does that relate to the above deployment on, say 3 hosts? Is that scale-out only within that JVM 9in which case I am good and don't need to worry), or is that somehow offloaded to other servers in the cluster, and if so how is that deployed? On Wed, Mar 28, 2018 at 8:17 PM, kedar mhaswade <[hidden email]> wrote:
-- Niclas Hedhman, Software Developer http://zest.apache.org - New Energy for Java |
Free forum by Nabble | Edit this page |