hi community, i want test my flink k-means on a hadoop cluster. i use the cloudera live distribution. how i can run flink on this cluster? maybe only the java dependencies are engouth? |
Hi,
you can start Flink on YARN on the Cloudera distribution. See here for more: http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html These are the commands you need to execute
On Tue, Jun 2, 2015 at 2:03 PM, Pa Rö <[hidden email]> wrote:
|
2015-06-02 14:29 GMT+02:00 Robert Metzger <[hidden email]>:
|
I would recommend using HDFS. For that, you need to specify the paths like this: hdfs:///path/to/data. On Tue, Jun 2, 2015 at 2:48 PM, Pa Rö <[hidden email]> wrote:
|
thanks, now i want run my app on cloudera live vm single node,how i can define my flink job with hue? i try to run the flink script in the hdfs, it's not work. best regards, paul 2015-06-02 14:50 GMT+02:00 Robert Metzger <[hidden email]>:
|
Hi Paul, why did running Flink from the regular scripts not work for you? I'm not an expert on Hue, I would recommend asking in the Hue user forum / mailing list: https://groups.google.com/a/cloudera.org/forum/#!forum/hue-user. On Thu, Jun 4, 2015 at 4:09 PM, Pa Rö <[hidden email]> wrote:
|
hi robert, i think the problem is the hue api, but on the new hue release, they have a spark submit api. 2015-06-04 16:12 GMT+02:00 Robert Metzger <[hidden email]>:
|
It should be certainly possible to run Flink on a cloudera live VM I think these are the commands you need to execute: wget http://stratosphere-bin.s3-website-us-east-1.amazonaws.com/flink-0.9-SNAPSHOT-bin-hadoop2.tgz tar xvzf flink-0.9-SNAPSHOT-bin-hadoop2.tgz cd flink-0.9-SNAPSHOT/ export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop/ ./bin/yarn-session.sh -n 1 -jm 1024 -tm 1024 If that is not working for you, please post the exact error message you are getting and I can help you to get it to run. On Thu, Jun 4, 2015 at 4:18 PM, Pa Rö <[hidden email]> wrote:
|
you mean run this command on terminal/shell and not define a hue job? 2015-06-04 16:25 GMT+02:00 Robert Metzger <[hidden email]>:
|
Yes, you have to run these commands in the command line of the Cloudera VM. On Thu, Jun 4, 2015 at 4:28 PM, Pa Rö <[hidden email]> wrote:
|
okay, now it run on my hadoop. how i can start my flink job? and where must the jar file save, at hdfs or as local file? 2015-06-04 16:31 GMT+02:00 Robert Metzger <[hidden email]>:
|
Once you've started the YARN session, you can submit a Flink job with "./bin/flink run <pathToYourJar>". The jar file of your job doesn't need to be in HDFS. It has to be in the local file system and flink will send it to all machines. On Thu, Jun 4, 2015 at 4:48 PM, Pa Rö <[hidden email]> wrote:
|
okay, it's work, i get a exception: how i must set the files in the hdfs? [cloudera@quickstart Desktop]$ cd flink-0.9-SNAPSHOT/bin/ [cloudera@quickstart bin]$ flink run /home/cloudera/Desktop/ma-flink.jar bash: flink: command not found [cloudera@quickstart bin]$ ./flink run /home/cloudera/Desktop/ma-flink.jar log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Found YARN properties file /home/cloudera/Desktop/flink-0.9-SNAPSHOT/bin/../conf/.yarn-properties Using JobManager address from YARN properties quickstart.cloudera/127.0.0.1:53874 java.io.IOException: Mkdirs failed to create /user/cloudera/outputs at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:783) at mgm.tp.bigdata.ma_commons.commons.Seeding.randomSeeding(Seeding.java:21) at mgm.tp.bigdata.ma_flink.FlinkMain.getCentroidDataSet(FlinkMain.java:178) at mgm.tp.bigdata.ma_flink.FlinkMain.main(FlinkMain.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Failed to submit job 934743a5c49c6d5e31c9e8201452e36d (KMeans Flink) at org.apache.flink.client.program.Client.run(Client.java:412) at org.apache.flink.client.program.Client.run(Client.java:355) at org.apache.flink.client.program.Client.run(Client.java:348) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63) at mgm.tp.bigdata.ma_flink.FlinkMain.main(FlinkMain.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to submit job 934743a5c49c6d5e31c9e8201452e36d (KMeans Flink) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:595) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:192) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:99) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: File /user/cloudera/outputs/seed-1 does not exist or the user running Flink ('yarn') has insufficient permissions to access it. at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:162) at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:471) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:535) ... 21 more Caused by: java.io.FileNotFoundException: File /user/cloudera/outputs/seed-1 does not exist or the user running Flink ('yarn') has insufficient permissions to access it. at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:390) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:146) ... 23 more quickstart.cloudera:50075/home/cloudera/output? 2015-06-04 16:51 GMT+02:00 Robert Metzger <[hidden email]>:
|
It looks like the user "yarn" which is running Flink doesn't have permission to access the files. Can you do "sudo su yarn" to become the "yarn" user. Then, you can do "hadoop fs -chmod 777" to make the files accessible for everyone. On Thu, Jun 4, 2015 at 4:59 PM, Pa Rö <[hidden email]> wrote:
|
[cloudera@quickstart bin]$ sudo su yarn you understand?bash-4.1$ hadoop fs -chmod 777 -chmod: Not enough arguments: expected 2 but got 1 Usage: hadoop fs [generic options] -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH... bash-4.1$ 2015-06-04 17:04 GMT+02:00 Robert Metzger <[hidden email]>:
|
bash-4.1$ hadoop fs -chmod 777 * chmod: `config.sh': No such file or directory chmod: `flink': No such file or directory chmod: `flink.bat': No such file or directory chmod: `jobmanager.sh': No such file or directory chmod: `pyflink2.sh': No such file or directory chmod: `pyflink3.sh': No such file or directory chmod: `start-cluster.sh': No such file or directory chmod: `start-cluster-streaming.sh': No such file or directory chmod: `start-local.bat': No such file or directory chmod: `start-local.sh': No such file or directory chmod: `start-local-streaming.sh': No such file or directory chmod: `start-scala-shell.sh': No such file or directory chmod: `start-webclient.sh': No such file or directory chmod: `stop-cluster.sh': No such file or directory chmod: `stop-local.sh': No such file or directory chmod: `stop-webclient.sh': No such file or directory chmod: `taskmanager.sh': No such file or directory chmod: `webclient.sh': No such file or directory chmod: `yarn-session.sh': No such file or directory 2015-06-04 17:08 GMT+02:00 Pa Rö <[hidden email]>:
|
i get the same exception Using JobManager address from YARN properties quickstart.cloudera/127.0.0.1:53874 java.io.IOException: Mkdirs failed to create /user/cloudera/outputs 2015-06-04 17:09 GMT+02:00 Pa Rö <[hidden email]>:
|
In reply to this post by Pa Rö
As the output of the "hadoop" tool indicates, it expects two arguments, you only passed one (777). The second argument it is expecting is the path to the file you want to change. In your case, it is: hadoop fs -chmod 777 /user/cloudera/outputs The reason why hadoop fs -chmod 777 * does not work is the following: the * is evaluated by your local bash and expanded to the files which are present in your current, local directory. The bash expansion is not able to expand to the files in HDFS. On Thu, Jun 4, 2015 at 5:08 PM, Pa Rö <[hidden email]> wrote:
|
i try this: [cloudera@quickstart bin]$ sudo su yarn bash-4.1$ hadoop fs -chmod 777 /user/cloudera/outputs chmod: changing permissions of '/user/cloudera/outputs': Permission denied. user=yarn is not the owner of inode=outputs bash-4.1$ hadoop fs -chmod 777 /user/cloudera/inputs chmod: changing permissions of '/user/cloudera/inputs': Permission denied. user=yarn is not the owner of inode=inputs bash-4.1$ exit exit [cloudera@quickstart bin]$ sudo ./flink run /home/cloudera/Desktop/ma-flink.jar log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Found YARN properties file /home/cloudera/Desktop/flink-0.9-SNAPSHOT/bin/../conf/.yarn-properties Using JobManager address from YARN properties quickstart.cloudera/127.0.0.1:53874 org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Failed to submit job 2f46ef5dff4ecf5552b3477ed1c6f4b9 (KMeans Flink) at org.apache.flink.client.program.Client.run(Client.java:412) at org.apache.flink.client.program.Client.run(Client.java:355) at org.apache.flink.client.program.Client.run(Client.java:348) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63) at mgm.tp.bigdata.ma_flink.FlinkMain.main(FlinkMain.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to submit job 2f46ef5dff4ecf5552b3477ed1c6f4b9 (KMeans Flink) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:595) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:192) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:99) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: File /user/cloudera/inputs does not exist or the user running Flink ('yarn') has insufficient permissions to access it. at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:162) at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:471) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:535) ... 21 more Caused by: java.io.FileNotFoundException: File /user/cloudera/inputs does not exist or the user running Flink ('yarn') has insufficient permissions to access it. at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:390) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:146) ... 23 more 2015-06-04 17:15 GMT+02:00 Robert Metzger <[hidden email]>:
|
I would recommend you to read the output of the commands you are entering more closely. bash-4.1$ hadoop fs -chmod 777 /user/cloudera/outputs chmod: changing permissions of '/user/cloudera/outputs': Permission denied. user=yarn is not the owner of inode=outputs Chmod is clearly stating that is was not able to execute the command due to an error. Unless this error has been resolved, Flink will not be able to write to that directory. I would recommend you to familiarize yourself with the permission system of HDFS: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html On Thu, Jun 4, 2015 at 5:17 PM, Pa Rö <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |