Hi all, We've got a jar with hadoop configuration files in it. Previously we use blocking mode to deploy jars on YARN, they run well. Recently we find the client process occupies more and more memory , so we try to use detached mode, but the job failed to deploy with following error information:
Then I found this email, http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301F26148CEEE21005E9B94@...>, and set yarn.per-job-cluster.include-user-jar: LAST, then part of our jobs can be deployed as expected. But for some job need to operate another hdfs, with hadoop conf files in them, there's still problem. Job manager cannot resolve the hdfs domain name. I guess it's because the hadoop conf file in jar is loaded instead of the conf file in client hadoop dir. Is here someone can help?
|
Hi Sysuke, Could you check the JM log (YARN AM container log) first? You might find the direct failure message there. Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 12:02, sysuke Lee <[hidden email]> wrote:
|
In reply to this post by sysuke Lee
Hi sysuke, >> Why the Yarn per-job attach mode could work, but detach mode could not? It is just becausein 1.9 and previous versions, the per-job have very different code path for attach and detach mode. For attach mode, Flink client will start a session cluster, and then submit a job to the existing session. So all the users jars are loaded by user classloader, not system classloader. For detach mode, all the jars will be shipped by Yarn local resources and appended to the system classpath of jobmanager and taskmanager. The behavior will be changed from 1.10. Both detach and attach will always be the real per-job, not simulate by session. You could check FLIP-82 for more information[1]. >> How to fix this problem? 1. If you Yarn cluster could support multiple hdfs clusters, then you will not need to add hdfs configuration in you jar. That's how we use it in production environment. 2. If you can not change this, and you will use Flink 1.10. Then you could set `yarn.per-job-cluster.include-user-jar: DISABLED`. Then all the user jars will not be added to system classpath. Instead, they will be loaded by user classloader. This is a new feature in 1.10. Check more information here[2]. 3. If you are still using the 1.9 and previous versions, move the hdfs configuration out of your jar. Then use `-t` to ship your hadoop configuration and reset the hadoop env. -yt /path/of/my-hadoop-conf -yD containerized.master.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf' -yD containerized.taskmanager.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf' Best, Yang sysuke Lee <[hidden email]> 于2020年1月17日周五 下午12:02写道:
|
Ah, thanks Yang for the fixup. I misunderstood the original answer. Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 16:39, Yang Wang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |