Flink 1.12.1 example applications failing on a single node yarn cluster

classic Classic list List threaded Threaded
6 messages Options
tuk
Reply | Threaded
Open this post in threaded view
|

Flink 1.12.1 example applications failing on a single node yarn cluster

tuk
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?


tuk
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

tuk
The same has been asked in StackOverflow also. Any suggestions here?

On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna <[hidden email]> wrote:
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?


tuk
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

tuk
In my setup hadoop-yarn-nodemenager is running with yarn user. 

ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn      4953     1  2 05:53 ?        00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/heap-dumps/yarn -XX:+ExitOnOutOfMemoryError -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Xmx512m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager


I was executing the ./bin/flink command as ubuntu user and yarn user does not have permission to write to ubuntu's home folder in my setup. 

ubuntu@vrni-platform:/tmp/flink$ echo ~ubuntu
/home/ubuntu
ubuntu@vrni-platform:/tmp/flink$ echo ~yarn
/var/lib/hadoop-yarn



It appears to me flink needs permission to write to user's home directory to create a .flink folder even when the job is submitted in yarn. It is working fine for me if I run the flink with yarn user. in my setup.

Just for my knowledge is there any config in flink to specify the location of .flink folder?

On Thu, Feb 25, 2021 at 10:48 AM Debraj Manna <[hidden email]> wrote:
The same has been asked in StackOverflow also. Any suggestions here?

On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna <[hidden email]> wrote:
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

Matthias
Hi Debraj,
thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this folder.

Best,
Matthias

On Fri, Feb 26, 2021 at 2:40 PM Debraj Manna <[hidden email]> wrote:
In my setup hadoop-yarn-nodemenager is running with yarn user. 

ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn      4953     1  2 05:53 ?        00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/heap-dumps/yarn -XX:+ExitOnOutOfMemoryError -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Xmx512m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager


I was executing the ./bin/flink command as ubuntu user and yarn user does not have permission to write to ubuntu's home folder in my setup. 

ubuntu@vrni-platform:/tmp/flink$ echo ~ubuntu
/home/ubuntu
ubuntu@vrni-platform:/tmp/flink$ echo ~yarn
/var/lib/hadoop-yarn



It appears to me flink needs permission to write to user's home directory to create a .flink folder even when the job is submitted in yarn. It is working fine for me if I run the flink with yarn user. in my setup.

Just for my knowledge is there any config in flink to specify the location of .flink folder?

On Thu, Feb 25, 2021 at 10:48 AM Debraj Manna <[hidden email]> wrote:
The same has been asked in StackOverflow also. Any suggestions here?

On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna <[hidden email]> wrote:
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?


tuk
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

tuk
Thanks Matthias for replying. 

Yes there was some yarn configuration issue on my side which I mentioned in my last email. 

I am starting on flink. So just for my understanding in few links (posted below) it is reported that flink needs to create a .flink directory in the users home folder. Even though I am not using HDFS with yarn (in single-node deployment) but I am also observing the same. Is there a way I can configure the location where flink stores the jar and configuration file as mentioned in the below link? 


From the above link

"Flink creates a .flink/ directory in the users home directory where it stores the Flink jar and configuration file."

Same mentioned here.




On Fri, Feb 26, 2021 at 9:45 PM Matthias Pohl <[hidden email]> wrote:
Hi Debraj,
thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this folder.

Best,
Matthias

On Fri, Feb 26, 2021 at 2:40 PM Debraj Manna <[hidden email]> wrote:
In my setup hadoop-yarn-nodemenager is running with yarn user. 

ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn      4953     1  2 05:53 ?        00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/heap-dumps/yarn -XX:+ExitOnOutOfMemoryError -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Xmx512m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager


I was executing the ./bin/flink command as ubuntu user and yarn user does not have permission to write to ubuntu's home folder in my setup. 

ubuntu@vrni-platform:/tmp/flink$ echo ~ubuntu
/home/ubuntu
ubuntu@vrni-platform:/tmp/flink$ echo ~yarn
/var/lib/hadoop-yarn



It appears to me flink needs permission to write to user's home directory to create a .flink folder even when the job is submitted in yarn. It is working fine for me if I run the flink with yarn user. in my setup.

Just for my knowledge is there any config in flink to specify the location of .flink folder?

On Thu, Feb 25, 2021 at 10:48 AM Debraj Manna <[hidden email]> wrote:
The same has been asked in StackOverflow also. Any suggestions here?

On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna <[hidden email]> wrote:
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

Matthias
Hi Debrai,
sorry for misleading you first. You're right. I looked through the code once more and found something: There's the yarn.staging-directory [1] that is set to the user's home folder by default. This parameter is used by the YarnApplicationFileUploader [2] to upload the application files.

I hope that helps. Best,
Matthias


On Fri, Feb 26, 2021 at 5:51 PM Debraj Manna <[hidden email]> wrote:
Thanks Matthias for replying. 

Yes there was some yarn configuration issue on my side which I mentioned in my last email. 

I am starting on flink. So just for my understanding in few links (posted below) it is reported that flink needs to create a .flink directory in the users home folder. Even though I am not using HDFS with yarn (in single-node deployment) but I am also observing the same. Is there a way I can configure the location where flink stores the jar and configuration file as mentioned in the below link? 


From the above link

"Flink creates a .flink/ directory in the users home directory where it stores the Flink jar and configuration file."

Same mentioned here.




On Fri, Feb 26, 2021 at 9:45 PM Matthias Pohl <[hidden email]> wrote:
Hi Debraj,
thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this folder.

Best,
Matthias

On Fri, Feb 26, 2021 at 2:40 PM Debraj Manna <[hidden email]> wrote:
In my setup hadoop-yarn-nodemenager is running with yarn user. 

ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn      4953     1  2 05:53 ?        00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/heap-dumps/yarn -XX:+ExitOnOutOfMemoryError -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Xmx512m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager


I was executing the ./bin/flink command as ubuntu user and yarn user does not have permission to write to ubuntu's home folder in my setup. 

ubuntu@vrni-platform:/tmp/flink$ echo ~ubuntu
/home/ubuntu
ubuntu@vrni-platform:/tmp/flink$ echo ~yarn
/var/lib/hadoop-yarn



It appears to me flink needs permission to write to user's home directory to create a .flink folder even when the job is submitted in yarn. It is working fine for me if I run the flink with yarn user. in my setup.

Just for my knowledge is there any config in flink to specify the location of .flink folder?

On Thu, Feb 25, 2021 at 10:48 AM Debraj Manna <[hidden email]> wrote:
The same has been asked in StackOverflow also. Any suggestions here?

On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna <[hidden email]> wrote:
I am trying out flink example as explained in flink docs in a single node yarn cluster.

On executing 

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

It is failing with the below errors

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
I have made the log level DEBUG and I do see that flink-dist_2.12-1.12.1.jar is getting copied to /home/ubuntu/.flink/application_1614159836384_0045. 
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
The entire DEBUG logs are placed here. Nodemanager logs are placed here. 
Can someone let me know what is going wrong? Does flink not support single node yarn cluster for development?