How to run a Flink job in EMR?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to run a Flink job in EMR?

Chris Schneider
Hi Gang,

I’ve been trying to get some Flink code running in Amazon Web Services’s Elastic MapReduce, but so far the only success I’ve had required me to log into the master node, download my jar from S3 to there, and then run it on the master node from the command line using something like the following:

% bin/flink run -m yarn-cluster -yn 2 -p 4 <my jar name> <my main program arguments>

The two other approaches I’ve tried (based on the AWS EMR Flink documentation) that didn’t work were:

1) Add an EMR Step to launch my program as part of a Flink session - I couldn’t figure out how to get my job jar deployed as part of the step, and I couldn’t successfully configure a Bootstrap Action to deploy it before running that step.

2) Start a Long-Running Flink Session via an EMR Step (which worked) and then use the Flink Web UI to upload my job jar from my workstation - It killed the ApplicationMaster that was running the Flink Web UI without providing much interesting logging. I’ve appended both the container log output and the jobmanager.log contents to the end of this email.

In addition, it would be nice to gain access to S3 resources using credentials. I’ve tried using an AmazonS3ClientBuilder, and passing an EnvironmentVariableCredentialsProvider to its setCredentials method. I’d hoped that this might pick up the credentials I set up on my master node in the $AWS_ACCESS_KEY_ID and $AWS_SECRET_KEY environment variables I've exported, but I’m guessing that the shell this code is running in (on the slaves?) doesn’t have access to those variables.

Here’s a list of interesting version numbers:

flink-java-1.2.0.jar
flink-core-1.2.0.jar
flink-annotations-1.2.0.jar
emr-5.4.0 with Flink 1.2.0 installed

Any help would be greatly appreciated. I’m lusting after an example showing how to deploy a simple Flink jar from S3 to a running EMR cluster and then get Flink to launch it with an arbitrary set of Flink and user arguments. Bonus points for setting up an AmazonS3 Java client object without including those credentials within my Java source code.

Best Regards,

- Chris

Here’s the container logging from my attempt to submit my job via the Flink web UI:

Application application_1496707031947_0002 failed 1 times due to AM Container for appattempt_1496707031947_0002_000001 exited with exitCode: 255
For more detailed output, check application tracking page:http://ip-10-85-61-122.ec2.internal:8088/cluster/app/application_1496707031947_0002 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496707031947_0002_01_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
Application

There's a bunch of startup messages in the jobmanager.log, but only the following output was generated by my attempt to submit my Flink job:

2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:44948
2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard root cache directory /tmp/flink-web-f3dde9b2-2384-49ce-b7a2-4c93bb1f5b6a
2017-06-06 00:41:55,336 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard jar upload directory /tmp/flink-web-71b2e41d-b08d-43e8-bd0d-b6fb5cc329a2


-----------------------------------------
Chris Schneider
http://www.scaleunlimited.com
custom big data solutions
-----------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: How to run a Flink job in EMR?

Foster, Craig

1)       Since the jar is only required on the master node you should be able to just run a step with a very simple script like ‘bash –c “aws s3 cp s3://mybucket/myjar.jar .”’

So if you were to do that using the step similar to outlined in the EMR documentation, but replacing withArgs with the above command as args (I think there’s an example of this on that same EMR docs page you refer to).

Then add another step after that which actually runs the flink job. The jar will be located in /home/hadoop. In the future, I’m hoping this can just be simplified to flink run -yn 2 -p 4 s3://mybucket/myjar.jar … but it doesn’t seem to be the case right now.

2)       If you ran this as a step, you should be able to see the error the Flink driver gives in the step’s logs.

3)       Provided your S3 bucket and EMR cluster EC2 IAM role/”instance profile” belong to the same account (or at least the permissions are setup such that you can download a file from S3 to your EC2 instances), you should be able to use the DefaultAWSCredentialsProviderChain, which won’t require you enter any credentials as it uses the EC2 instance profile credentials provider.

 

 

Hope that helps.

 

Thanks,

Craig

 

 

From: Chris Schneider <[hidden email]>
Date: Wednesday, June 7, 2017 at 6:16 PM
To: "[hidden email]" <[hidden email]>
Subject: How to run a Flink job in EMR?

 

Hi Gang,

 

I’ve been trying to get some Flink code running in Amazon Web Services’s Elastic MapReduce, but so far the only success I’ve had required me to log into the master node, download my jar from S3 to there, and then run it on the master node from the command line using something like the following:

 

% bin/flink run -m yarn-cluster -yn 2 -p 4 <my jar name> <my main program arguments>

 

The two other approaches I’ve tried (based on the AWS EMR Flink documentation) that didn’t work were:

 

1) Add an EMR Step to launch my program as part of a Flink session - I couldn’t figure out how to get my job jar deployed as part of the step, and I couldn’t successfully configure a Bootstrap Action to deploy it before running that step.

 

2) Start a Long-Running Flink Session via an EMR Step (which worked) and then use the Flink Web UI to upload my job jar from my workstation - It killed the ApplicationMaster that was running the Flink Web UI without providing much interesting logging. I’ve appended both the container log output and the jobmanager.log contents to the end of this email.

In addition, it would be nice to gain access to S3 resources using credentials. I’ve tried using an AmazonS3ClientBuilder, and passing an EnvironmentVariableCredentialsProvider to its setCredentials method. I’d hoped that this might pick up the credentials I set up on my master node in the $AWS_ACCESS_KEY_ID and $AWS_SECRET_KEY environment variables I've exported, but I’m guessing that the shell this code is running in (on the slaves?) doesn’t have access to those variables.

Here’s a list of interesting version numbers:

 

flink-java-1.2.0.jar

flink-core-1.2.0.jar

flink-annotations-1.2.0.jar

emr-5.4.0 with Flink 1.2.0 installed

 

Any help would be greatly appreciated. I’m lusting after an example showing how to deploy a simple Flink jar from S3 to a running EMR cluster and then get Flink to launch it with an arbitrary set of Flink and user arguments. Bonus points for setting up an AmazonS3 Java client object without including those credentials within my Java source code.

 

Best Regards,

 

- Chris

 

Here’s the container logging from my attempt to submit my job via the Flink web UI:

Application application_1496707031947_0002 failed 1 times due to AM Container for appattempt_1496707031947_0002_000001 exited with exitCode: 255
For more detailed output, check application tracking page:http://ip-10-85-61-122.ec2.internal:8088/cluster/app/application_1496707031947_0002 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496707031947_0002_01_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
Application

There's a bunch of startup messages in the jobmanager.log, but only the following output was generated by my attempt to submit my Flink job:

2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:44948
2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard root cache directory /tmp/flink-web-f3dde9b2-2384-49ce-b7a2-4c93bb1f5b6a
2017-06-06 00:41:55,336 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard jar upload directory /tmp/flink-web-71b2e41d-b08d-43e8-bd0d-b6fb5cc329a2

 

-----------------------------------------

Chris Schneider

http://www.scaleunlimited.com
custom big data solutions

-----------------------------------------

 

Reply | Threaded
Open this post in threaded view
|

Re: How to run a Flink job in EMR?

Foster, Craig

Ah, maybe (1) wasn’t entirely clear so here’s the copy/pasted example with what I suggested:

 

    HadoopJarStepConfig copyJar = new HadoopJarStepConfig()

      .withJar("command-runner.jar")

      .withArgs("bash","-c", "aws s3 cp s3://mybucket/myjar.jar /home/hadoop"

    );

 

 

From: "Foster, Craig" <[hidden email]>
Date: Wednesday, June 7, 2017 at 7:21 PM
To: Chris Schneider <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: How to run a Flink job in EMR?

 

1)      Since the jar is only required on the master node you should be able to just run a step with a very simple script like ‘bash –c “aws s3 cp s3://mybucket/myjar.jar .”’

So if you were to do that using the step similar to outlined in the EMR documentation, but replacing withArgs with the above command as args (I think there’s an example of this on that same EMR docs page you refer to).

Then add another step after that which actually runs the flink job. The jar will be located in /home/hadoop. In the future, I’m hoping this can just be simplified to flink run -yn 2 -p 4 s3://mybucket/myjar.jar … but it doesn’t seem to be the case right now.

2)      If you ran this as a step, you should be able to see the error the Flink driver gives in the step’s logs.

3)      Provided your S3 bucket and EMR cluster EC2 IAM role/”instance profile” belong to the same account (or at least the permissions are setup such that you can download a file from S3 to your EC2 instances), you should be able to use the DefaultAWSCredentialsProviderChain, which won’t require you enter any credentials as it uses the EC2 instance profile credentials provider.

 

 

Hope that helps.

 

Thanks,

Craig

 

 

From: Chris Schneider <[hidden email]>
Date: Wednesday, June 7, 2017 at 6:16 PM
To: "[hidden email]" <[hidden email]>
Subject: How to run a Flink job in EMR?

 

Hi Gang,

 

I’ve been trying to get some Flink code running in Amazon Web Services’s Elastic MapReduce, but so far the only success I’ve had required me to log into the master node, download my jar from S3 to there, and then run it on the master node from the command line using something like the following:

 

% bin/flink run -m yarn-cluster -yn 2 -p 4 <my jar name> <my main program arguments>

 

The two other approaches I’ve tried (based on the AWS EMR Flink documentation) that didn’t work were:

 

1) Add an EMR Step to launch my program as part of a Flink session - I couldn’t figure out how to get my job jar deployed as part of the step, and I couldn’t successfully configure a Bootstrap Action to deploy it before running that step.

 

2) Start a Long-Running Flink Session via an EMR Step (which worked) and then use the Flink Web UI to upload my job jar from my workstation - It killed the ApplicationMaster that was running the Flink Web UI without providing much interesting logging. I’ve appended both the container log output and the jobmanager.log contents to the end of this email.

In addition, it would be nice to gain access to S3 resources using credentials. I’ve tried using an AmazonS3ClientBuilder, and passing an EnvironmentVariableCredentialsProvider to its setCredentials method. I’d hoped that this might pick up the credentials I set up on my master node in the $AWS_ACCESS_KEY_ID and $AWS_SECRET_KEY environment variables I've exported, but I’m guessing that the shell this code is running in (on the slaves?) doesn’t have access to those variables.

Here’s a list of interesting version numbers:

 

flink-java-1.2.0.jar

flink-core-1.2.0.jar

flink-annotations-1.2.0.jar

emr-5.4.0 with Flink 1.2.0 installed

 

Any help would be greatly appreciated. I’m lusting after an example showing how to deploy a simple Flink jar from S3 to a running EMR cluster and then get Flink to launch it with an arbitrary set of Flink and user arguments. Bonus points for setting up an AmazonS3 Java client object without including those credentials within my Java source code.

 

Best Regards,

 

- Chris

 

Here’s the container logging from my attempt to submit my job via the Flink web UI:

Application application_1496707031947_0002 failed 1 times due to AM Container for appattempt_1496707031947_0002_000001 exited with exitCode: 255
For more detailed output, check application tracking page:http://ip-10-85-61-122.ec2.internal:8088/cluster/app/application_1496707031947_0002 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496707031947_0002_01_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
Application

There's a bunch of startup messages in the jobmanager.log, but only the following output was generated by my attempt to submit my Flink job:

2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:44948
2017-06-06 00:41:55,332 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard root cache directory /tmp/flink-web-f3dde9b2-2384-49ce-b7a2-4c93bb1f5b6a
2017-06-06 00:41:55,336 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard jar upload directory /tmp/flink-web-71b2e41d-b08d-43e8-bd0d-b6fb5cc329a2

 

-----------------------------------------

Chris Schneider

http://www.scaleunlimited.com
custom big data solutions

-----------------------------------------