I'm trying to understand Flink YARN configuration. The flink-conf.yaml file is supposedly the way to configure Flink, except when you launch Flink using YARN since that's determined for the AM. The following
is contradictory or not completely clear:
"The system will use the configuration in
Flink on YARN will overwrite the following configuration parameters OK, so it will use conf/flink-config.yaml, except for jobmanager.rpc.address/port which will be decided by YARN and not necessarily reported to the user since those are dynamically allocated by YARN. That's
fine with me, but if I want to make a "long-running" Flink cluster available for more than one user, where do I check in Flink for the Application Master hostname--or do I just have to scrape output of logs (which would definitely be undesirable)? First, I
thought this would be written by Flink to conf/flink-config.yaml. It is not. Then I thought it must surely be written to the HDFS configuration directory (under something like hdfs://$USER/.flink/) for that application but that is merely copied from the original
conf/flink-config.yaml and doesn't have an accurate configuration for the specified application. So is there an accurate config somewhere in HDFS or on the ResourceManager--i.e. where could I programmatically find that (outside of manipulating YARN app names
or scraping)? Thanks, Craig |
Hi Craig! For YARN sessions, Flink will - (a) register the app master hostname/port/etc at Yarn, so you can get them from example from the yarn UI and tools - (b) it will create a .yarn-properties file that contain the hostname/ports info. Future calls to the command line pick up the info from there. /cc Robert Greetings, Stephan On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <[hidden email]> wrote:
|
Stephan, Will the jobmanager-UI exist? E.g. if I am running Flink on YARN will I be able to submit apps/see logs and DAGs through the web interface? thanks, tg Trevor Grant Data Scientist "Fortunate is he, who is able to know the causes of things." -Virgil On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <[hidden email]> wrote:
|
The JobManager UI starts when running Flink on YARN. The address of the UI is registered at YARN, so you can also access it through YARNs command line tools or its web interface. On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant <[hidden email]> wrote:
|
Yes, it will exist also in the Yarn session and continue to run across
jobs. Its address is also printed on the console when the cluster is brought up. On Mon, Aug 29, 2016 at 2:44 PM, Robert Metzger <[hidden email]> wrote: > The JobManager UI starts when running Flink on YARN. > The address of the UI is registered at YARN, so you can also access it > through YARNs command line tools or its web interface. > > On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant <[hidden email]> > wrote: >> >> Stephan, >> >> Will the jobmanager-UI exist? E.g. if I am running Flink on YARN will I >> be able to submit apps/see logs and DAGs through the web interface? >> >> thanks, >> tg >> >> >> >> Trevor Grant >> Data Scientist >> https://github.com/rawkintrevo >> http://stackexchange.com/users/3002022/rawkintrevo >> http://trevorgrant.org >> >> "Fortunate is he, who is able to know the causes of things." -Virgil >> >> >> On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> Hi Craig! >>> >>> For YARN sessions, Flink will >>> - (a) register the app master hostname/port/etc at Yarn, so you can get >>> them from example from the yarn UI and tools >>> - (b) it will create a .yarn-properties file that contain the >>> hostname/ports info. Future calls to the command line pick up the info from >>> there. >>> >>> /cc Robert >>> >>> Greetings, >>> Stephan >>> >>> >>> On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <[hidden email]> >>> wrote: >>>> >>>> I'm trying to understand Flink YARN configuration. The flink-conf.yaml >>>> file is supposedly the way to configure Flink, except when you launch Flink >>>> using YARN since that's determined for the AM. The following is >>>> contradictory or not completely clear: >>>> >>>> >>>> >>>> "The system will use the configuration in conf/flink-config.yaml. Please >>>> follow our configuration guide if you want to change something. >>>> >>>> Flink on YARN will overwrite the following configuration parameters >>>> jobmanager.rpc.address (because the JobManager is always allocated at >>>> different machines), taskmanager.tmp.dirs (we are using the tmp directories >>>> given by YARN) and parallelism.default if the number of slots has been >>>> specified." >>>> >>>> >>>> >>>> OK, so it will use conf/flink-config.yaml, except for >>>> jobmanager.rpc.address/port which will be decided by YARN and not >>>> necessarily reported to the user since those are dynamically allocated by >>>> YARN. That's fine with me, but if I want to make a "long-running" Flink >>>> cluster available for more than one user, where do I check in Flink for the >>>> Application Master hostname--or do I just have to scrape output of logs >>>> (which would definitely be undesirable)? First, I thought this would be >>>> written by Flink to conf/flink-config.yaml. It is not. Then I thought it >>>> must surely be written to the HDFS configuration directory (under something >>>> like hdfs://$USER/.flink/) for that application but that is merely copied >>>> from the original conf/flink-config.yaml and doesn't have an accurate >>>> configuration for the specified application. So is there an accurate config >>>> somewhere in HDFS or on the ResourceManager--i.e. where could I >>>> programmatically find that (outside of manipulating YARN app names or >>>> scraping)? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Craig >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> > |
Free forum by Nabble | Edit this page |