Hi Jacky,I don't think ${FLINK_LOG_PREFIX} is available for Flink Yarn deployment. This is just my guess, that the actual file name becomes ".jit". You can try to verify that by looking for the hidden file.If it is indeed this problem, you can try to replace "${FLINK_LOG_PREFIX}" with "<LOG_DIR>/your-file-name.jit". The token "<LOG_DIR>" should be replaced with proper log directory path by Yarn automatically.I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's documentation [1]. This is IMO a bit misleading. I'll try to file an issue to improve the docs.Thank you~
Xintong Song
On Wed, May 13, 2020 at 2:45 AM Jacky D <[hidden email]> wrote:hi, Arvidthanks for the advice , I removed the quotes and it do created a yarn session on EMR , but I didn't find any jit log file generated .The config with quotes is working on standalone cluster . I also tried to dynamic pass the property within the yarn session command :
flink-yarn-session -n 1 -d -nm testSession -yD env.java.opts="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
but get same result , session created , but can not find any jit log file under container log .
Thanks
Jacky
Arvid Heise <[hidden email]> 于2020年5月12日周二 下午12:57写道:Hi Jacky,I suspect that the quotes are the actual issue. Could you try to remove them? See also [1].On Tue, May 12, 2020 at 4:03 PM Jacky D <[hidden email]> wrote:hi, XintongThanks for reply , I attached those lines below for application master start command :2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Crypto codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available.2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Using crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec.2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 16972020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 02020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true lastByteOffsetInBlock: 16972020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 02020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - Closing old block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_43152020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #702020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: complete took 2ms2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71 org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #712020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setTimes took 2ms2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72 org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #722020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setPermission took 2ms2020-05-11 21:16:16,654 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor - Application Master start command: $JAVA_HOME/bin/java -Xmx424m "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" -Dlog.file="<LOG_DIR>/jobmanager.log" -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> <LOG_DIR>/jobmanager.out 2> <LOG_DIR>/jobmanager.err2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: org.apache.hadoop.ipc.Client@28194a502020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setApplicationTags.2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setAttemptFailuresValidityInterval.2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setKeepContainersAcrossApplicationAttempts.2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setNodeLabelExpression.Xintong Song <[hidden email]> 于2020年5月11日周一 下午10:11写道:Hi Jacky,Could you search for "Application Master start command:" in the debug log and post the result and a few lines before & after that? This is not included in the clip of attached log file.Thank you~
Xintong Song
On Tue, May 12, 2020 at 5:33 AM Jacky D <[hidden email]> wrote:hi, RobertThanks so much for quick reply , I changed the log level to debug and attach the log file .ThanksJackyRobert Metzger <[hidden email]> 于2020年5月11日周一 下午4:14写道:Thanks a lot for posting the full output.It seems that Flink is passing an invalid list of arguments to the JVM.Can you- set the root log level in conf/log4j-yarn-session.properties to DEBUG- then launch the YARN session- share the log file of the yarn session on the mailing list?I'm particularly interested in the line printed here, as it shows the JVM invocation.On Mon, May 11, 2020 at 9:56 PM Jacky D <[hidden email]> wrote:Hi,RobertYes , I tried to retrieve more log info from yarn UI , the full logs showing below , this happens when I try to create a flink yarn session on emr when set up jitwatch configuration .2020-05-11 19:06:09,552 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink Yarn session.java.lang.reflect.UndeclaredThrowableExceptionat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813)Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session clusterat org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429)at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610)at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)... 2 moreCaused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.Diagnostics from YARN: Application application_1584459865196_0165 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1584459865196_0165_000001 exited with exitCode: 1Failing this attempt.Diagnostics: Exception from container-launch.Container id: container_1584459865196_0165_01_000001Exit code: 1Exception message: Usage: java [-options] class [args...](to execute a class)or java [-options] -jar jarfile [args...](to execute a jar file)where options include:-d32 use a 32-bit data model if available-d64 use a 64-bit data model if available-server to select the "server" VMThe default VM is server,because you are running on a server-class machine.-cp <class search path of directories and zip/jar files>-classpath <class search path of directories and zip/jar files>A : separated list of directories, JAR archives,and ZIP archives to search for class files.-D<name>=<value>set a system property-verbose:[class|gc|jni]enable verbose output-version print product version and exit-version:<value>Warning: this feature is deprecated and will be removedin a future release.require the specified version to run-showversion print product version and continue-jre-restrict-search | -no-jre-restrict-searchWarning: this feature is deprecated and will be removedin a future release.include/exclude user private JREs in the version search-? -help print this help message-X print help on non-standard options-ea[:<packagename>...|:<classname>]-enableassertions[:<packagename>...|:<classname>]enable assertions with specified granularity-da[:<packagename>...|:<classname>]-disableassertions[:<packagename>...|:<classname>]disable assertions with specified granularity-esa | -enablesystemassertionsenable system assertions-dsa | -disablesystemassertionsdisable system assertions-agentlib:<libname>[=<options>]load native agent library <libname>, e.g. -agentlib:hprofsee also, -agentlib:jdwp=help and -agentlib:hprof=help-agentpath:<pathname>[=<options>]load native agent library by full pathname-javaagent:<jarpath>[=<options>]load Java programming language agent, see java.lang.instrument-splash:<imagepath>show splash screen with specified imageSee http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.ThanksJackyRobert Metzger <[hidden email]> 于2020年5月11日周一 下午3:42写道:Hey Jacky,The error says "The YARN application unexpectedly switched to state FAILED during deployment.".Have you tried retrieving the YARN application logs?Does the YARN UI / resource manager logs reveal anything on the reason for the deployment to fail?Best,RobertOn Mon, May 11, 2020 at 9:34 PM Jacky D <[hidden email]> wrote:---------- Forwarded message ---------
发件人: Jacky D <[hidden email]>
Date: 2020年5月11日周一 下午3:12
Subject: Re: Flink Memory analyze on AWS EMR
To: Khachatryan Roman <[hidden email]>Hi, RomanThanks for quick response , I tried without logFIle option but failed with same error , I'm currently using flink 1.6 https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html, so I can only use Jitwatch or JMC . I guess those tools only available on Standalone cluster ? as document mentioned "Each standalone JobManager, TaskManager, HistoryServer, and ZooKeeper daemon redirectsstdout
andstderr
to a file with a.out
filename suffix and writes internal logging to a file with a.log
suffix. Java options configured by the user inenv.java.opts
" ?ThanksJacky
--Arvid Heise | Senior Java Developer
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng
Free forum by Nabble | Edit this page |