Flink CLI does not return after submitting yarn job in detached mode

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink CLI does not return after submitting yarn job in detached mode

makelkar
Hi there,
  
    I am trying to run a single flink job on YARN in detached mode. as per the docs for flink 1.4.2, I am using -yd to do that.

The problem I am having is the flink bash script doesn't terminate execution and return until I press control + c. In detached mode, I would expect the flink CLI to return as soon as yarn job is submitted. is there something I am missing? here is exact output I get -



./flink-1.4.2/bin/flink run -m yarn-cluster -yd -yn 2 -yqu "default"  -ytm 2048 myjar.jar \
....program arguments omitted


Using the result of 'hadoop classpath' to augment the Hadoop classpath: /Users/makelkar/work/hadoop-2.7.3/etc/hadoop:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/*:/Users/makelkar/work/hadoop-2.7.3/contrib/capacity-scheduler/*.jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,921 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2018-08-15 14:39:37,226 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=2048, numberTaskManagers=2, slotsPerTaskManager=1}
2018-08-15 14:39:37,651 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/Users/makelkar/work/flink/flink-1.4.2/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
2018-08-15 14:39:37,660 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/logback.xml to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/logback.xml

2018-08-15 14:39:37,986 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/log4j-1.2.17.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/log4j-1.2.17.jar
2018-08-15 14:39:38,011 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:38,586 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-python_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-python_2.11-1.4.2.jar
2018-08-15 14:39:38,603 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/log4j.properties to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/log4j.properties

2018-08-15 14:39:39,002 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:39,401 INFO  org.apache.flink.yarn.Utils                                   - Copying from /var/folders/b6/_t_6q0vs3glcggp_8rgyxxl40000gn/T/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp
2018-08-15 14:39:39,836 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2018-08-15 14:39:39,859 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill application_1534188161088_0019
Please also note that the temporary files of the YARN session in the home directoy will not be removed.
Cluster started: Yarn cluster with application id application_1534188161088_0019
Using address localhost:51252 to connect to JobManager.
Using the parallelism provided by the remote cluster (2). To use another parallelism, set it at the ./bin/flink client.
Starting execution of program
2018-08-15 14:39:47,757 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode


I have to press cntrl + c to kill this shell script. When I do that, the program prints messages below -

2018-08-15 14:39:56,332 INFO  org.apache.flink.yarn.YarnClusterClient                       - Shutting down YarnClusterClient from the client shutdown hook
2018-08-15 14:39:56,333 INFO  org.apache.flink.yarn.YarnClusterClient                       - Disconnecting YarnClusterClient from ApplicationMaster

Thanks,
Madhav.

Reply | Threaded
Open this post in threaded view
|

Re: Flink CLI does not return after submitting yarn job in detached mode

Marvin777
Hi, Madhav,
 
./flink-1.4.2/bin/flink run -m yarn-cluster -yd -yn 2 -yqu "default"  -ytm 2048 myjar.jar 

Modified to, ./flink-1.4.2/bin/flink run -m yarn-cluster -d -yn 2 -yqu "default"  -ytm 2048 myjar.jar 



image.png

madhav Kelkar <[hidden email]> 于2018年8月16日周四 上午5:01写道:
Hi there,
  
    I am trying to run a single flink job on YARN in detached mode. as per the docs for flink 1.4.2, I am using -yd to do that.

The problem I am having is the flink bash script doesn't terminate execution and return until I press control + c. In detached mode, I would expect the flink CLI to return as soon as yarn job is submitted. is there something I am missing? here is exact output I get -



./flink-1.4.2/bin/flink run -m yarn-cluster -yd -yn 2 -yqu "default"  -ytm 2048 myjar.jar \
....program arguments omitted


Using the result of 'hadoop classpath' to augment the Hadoop classpath: /Users/makelkar/work/hadoop-2.7.3/etc/hadoop:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/*:/Users/makelkar/work/hadoop-2.7.3/contrib/capacity-scheduler/*.jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,921 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2018-08-15 14:39:37,226 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=2048, numberTaskManagers=2, slotsPerTaskManager=1}
2018-08-15 14:39:37,651 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/Users/makelkar/work/flink/flink-1.4.2/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
2018-08-15 14:39:37,660 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/logback.xml to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/logback.xml

2018-08-15 14:39:37,986 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/log4j-1.2.17.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/log4j-1.2.17.jar
2018-08-15 14:39:38,011 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:38,586 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-python_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-python_2.11-1.4.2.jar
2018-08-15 14:39:38,603 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/log4j.properties to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/log4j.properties

2018-08-15 14:39:39,002 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:39,401 INFO  org.apache.flink.yarn.Utils                                   - Copying from /var/folders/b6/_t_6q0vs3glcggp_8rgyxxl40000gn/T/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp
2018-08-15 14:39:39,836 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2018-08-15 14:39:39,859 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill application_1534188161088_0019
Please also note that the temporary files of the YARN session in the home directoy will not be removed.
Cluster started: Yarn cluster with application id application_1534188161088_0019
Using address localhost:51252 to connect to JobManager.
Using the parallelism provided by the remote cluster (2). To use another parallelism, set it at the ./bin/flink client.
Starting execution of program
2018-08-15 14:39:47,757 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode


I have to press cntrl + c to kill this shell script. When I do that, the program prints messages below -

2018-08-15 14:39:56,332 INFO  org.apache.flink.yarn.YarnClusterClient                       - Shutting down YarnClusterClient from the client shutdown hook
2018-08-15 14:39:56,333 INFO  org.apache.flink.yarn.YarnClusterClient                       - Disconnecting YarnClusterClient from ApplicationMaster

Thanks,
Madhav.

Reply | Threaded
Open this post in threaded view
|

Re: Flink CLI does not return after submitting yarn job in detached mode

vino yang
Hi Marvin777,

You are wrong. It uses the Flink on YARN single job mode and should use the "-yd" parameter.

Hi Madhav,

I seem to have found the problem, the source code of your log is here.[1]  

It is based on a judgment method "isUsingInteractiveMode". 

The source code for this method is here[2], returning true when "program" is null. And when is this field null? it's here.[3]

So, from the source code point of view, I suggest you explicitly specify the class in which the Main method is located in the CLI args.






Thanks, vino.

Marvin777 <[hidden email]> 于2018年8月16日周四 上午11:00写道:
Hi, Madhav,
 
./flink-1.4.2/bin/flink run -m yarn-cluster -yd -yn 2 -yqu "default"  -ytm 2048 myjar.jar 

Modified to, ./flink-1.4.2/bin/flink run -m yarn-cluster -d -yn 2 -yqu "default"  -ytm 2048 myjar.jar 



image.png

madhav Kelkar <[hidden email]> 于2018年8月16日周四 上午5:01写道:
Hi there,
  
    I am trying to run a single flink job on YARN in detached mode. as per the docs for flink 1.4.2, I am using -yd to do that.

The problem I am having is the flink bash script doesn't terminate execution and return until I press control + c. In detached mode, I would expect the flink CLI to return as soon as yarn job is submitted. is there something I am missing? here is exact output I get -



./flink-1.4.2/bin/flink run -m yarn-cluster -yd -yn 2 -yqu "default"  -ytm 2048 myjar.jar \
....program arguments omitted


Using the result of 'hadoop classpath' to augment the Hadoop classpath: /Users/makelkar/work/hadoop-2.7.3/etc/hadoop:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/common/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/hdfs/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/yarn/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/Users/makelkar/work/hadoop-2.7.3/share/hadoop/mapreduce/*:/Users/makelkar/work/hadoop-2.7.3/contrib/capacity-scheduler/*.jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,873 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-08-15 14:39:36,921 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2018-08-15 14:39:37,226 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=2048, numberTaskManagers=2, slotsPerTaskManager=1}
2018-08-15 14:39:37,651 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/Users/makelkar/work/flink/flink-1.4.2/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
2018-08-15 14:39:37,660 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/logback.xml to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/logback.xml

2018-08-15 14:39:37,986 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/log4j-1.2.17.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/log4j-1.2.17.jar
2018-08-15 14:39:38,011 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:38,586 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-python_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/lib/flink-python_2.11-1.4.2.jar
2018-08-15 14:39:38,603 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/conf/log4j.properties to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/log4j.properties

2018-08-15 14:39:39,002 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/Users/makelkar/work/flink/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/flink-dist_2.11-1.4.2.jar
2018-08-15 14:39:39,401 INFO  org.apache.flink.yarn.Utils                                   - Copying from /var/folders/b6/_t_6q0vs3glcggp_8rgyxxl40000gn/T/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp to hdfs://localhost:9000/user/makelkar/.flink/application_1534188161088_0019/application_1534188161088_0019-flink-conf.yaml8441703337078262150.tmp
2018-08-15 14:39:39,836 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1534188161088_0019
2018-08-15 14:39:39,858 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2018-08-15 14:39:39,859 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2018-08-15 14:39:47,733 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill application_1534188161088_0019
Please also note that the temporary files of the YARN session in the home directoy will not be removed.
Cluster started: Yarn cluster with application id application_1534188161088_0019
Using address localhost:51252 to connect to JobManager.
Using the parallelism provided by the remote cluster (2). To use another parallelism, set it at the ./bin/flink client.
Starting execution of program
2018-08-15 14:39:47,757 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode


I have to press cntrl + c to kill this shell script. When I do that, the program prints messages below -

2018-08-15 14:39:56,332 INFO  org.apache.flink.yarn.YarnClusterClient                       - Shutting down YarnClusterClient from the client shutdown hook
2018-08-15 14:39:56,333 INFO  org.apache.flink.yarn.YarnClusterClient                       - Disconnecting YarnClusterClient from ApplicationMaster

Thanks,
Madhav.

Reply | Threaded
Open this post in threaded view
|

Re: Flink CLI does not return after submitting yarn job in detached mode

makelkar
Hi Vino,
           We should not have to specify class name using -c option to run
job in detached mode. I tried that this morning but it also didn't work.

           flink CLI always starts in interactive mode, and somehow ignores
option -yd specified in yarn-cluster mode. Can someone verify this please?
If its the case, its a bug in flink CLI.

       I have an ugly workaround where I start flink CLI in background, and
I would like to avoid doing that.

Thanks,
Madhav.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink CLI does not return after submitting yarn job in detached mode

vino yang
Hi Madhav,

Can you set the log level to DEBUG in the log4j-client configuration file? Then post the log. I can try to locate it through the log.

Thanks, vino.

makelkar <[hidden email]> 于2018年8月17日周五 上午1:27写道:
Hi Vino,
           We should not have to specify class name using -c option to run
job in detached mode. I tried that this morning but it also didn't work.

           flink CLI always starts in interactive mode, and somehow ignores
option -yd specified in yarn-cluster mode. Can someone verify this please?
If its the case, its a bug in flink CLI.

       I have an ugly workaround where I start flink CLI in background, and
I would like to avoid doing that.

Thanks,
Madhav.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/