yarn ship from s3

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

yarn ship from s3

Vijayendra Yadav
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Piotr Nowojski-4
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Vijayendra Yadav
Hi Piotr,

I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process. 
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.

I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.

Thanks,
Vijay


On Tue, May 25, 2021 at 12:50 AM Piotr Nowojski <[hidden email]> wrote:
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Matthias
Hi Vijay,
have you tried yarn-ship-files [1] or yarn-ship-archives [2]? Maybe, that's what you're looking for...

Best,
Matthias


On Tue, May 25, 2021 at 5:56 PM Vijayendra Yadav <[hidden email]> wrote:
Hi Piotr,

I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process. 
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.

I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.

Thanks,
Vijay


On Tue, May 25, 2021 at 12:50 AM Piotr Nowojski <[hidden email]> wrote:
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Vijayendra Yadav
Hi Pohl,

I tried to ship my property file. Example: -yarn.ship-files s3://applib/xx/xx/1.0-SNAPSHOT/application.properties  \

Error:

6:21:37.163 [main] ERROR org.apache.flink.client.cli.CliFrontend - Invalid command line arguments.
org.apache.flink.client.cli.CliArgsException: Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:244) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:223) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) [hadoop-common-2.10.0-amzn-0.jar:?]
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) [flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) [flink-dist_2.11-1.11.0.jar:1.11.0]
Caused by: java.io.FileNotFoundException: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getJarFile(CliFrontend.java:740) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:717) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:242) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        ... 8 more
Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files


Thanks,
Vijay

On Tue, May 25, 2021 at 11:58 PM Matthias Pohl <[hidden email]> wrote:
Hi Vijay,
have you tried yarn-ship-files [1] or yarn-ship-archives [2]? Maybe, that's what you're looking for...

Best,
Matthias


On Tue, May 25, 2021 at 5:56 PM Vijayendra Yadav <[hidden email]> wrote:
Hi Piotr,

I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process. 
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.

I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.

Thanks,
Vijay


On Tue, May 25, 2021 at 12:50 AM Piotr Nowojski <[hidden email]> wrote:
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Xintong Song
Hi Vijay,

Currently, Flink only supports shipping files from the local machine where job is submitted.

There are tickets [1][2][3] tracking the efforts that shipping files from remote paths, e.g., http, hdfs, etc. Once the efforts are done, adding s3 as an additional supported schema should be straightforward.

Unfortunately, these efforts are still in progress, and are more or less staled recently.

Thank you~

Xintong Song



On Thu, May 27, 2021 at 12:23 AM Vijayendra Yadav <[hidden email]> wrote:
Hi Pohl,

I tried to ship my property file. Example: -yarn.ship-files s3://applib/xx/xx/1.0-SNAPSHOT/application.properties  \

Error:

6:21:37.163 [main] ERROR org.apache.flink.client.cli.CliFrontend - Invalid command line arguments.
org.apache.flink.client.cli.CliArgsException: Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:244) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:223) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) [hadoop-common-2.10.0-amzn-0.jar:?]
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) [flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) [flink-dist_2.11-1.11.0.jar:1.11.0]
Caused by: java.io.FileNotFoundException: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getJarFile(CliFrontend.java:740) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:717) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:242) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        ... 8 more
Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files


Thanks,
Vijay

On Tue, May 25, 2021 at 11:58 PM Matthias Pohl <[hidden email]> wrote:
Hi Vijay,
have you tried yarn-ship-files [1] or yarn-ship-archives [2]? Maybe, that's what you're looking for...

Best,
Matthias


On Tue, May 25, 2021 at 5:56 PM Vijayendra Yadav <[hidden email]> wrote:
Hi Piotr,

I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process. 
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.

I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.

Thanks,
Vijay


On Tue, May 25, 2021 at 12:50 AM Piotr Nowojski <[hidden email]> wrote:
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: yarn ship from s3

Vijayendra Yadav
Thank You Xintong, I will look for these updates in the near future.

Regards,
Vijay

On Wed, May 26, 2021 at 6:40 PM Xintong Song <[hidden email]> wrote:
Hi Vijay,

Currently, Flink only supports shipping files from the local machine where job is submitted.

There are tickets [1][2][3] tracking the efforts that shipping files from remote paths, e.g., http, hdfs, etc. Once the efforts are done, adding s3 as an additional supported schema should be straightforward.

Unfortunately, these efforts are still in progress, and are more or less staled recently.

Thank you~

Xintong Song



On Thu, May 27, 2021 at 12:23 AM Vijayendra Yadav <[hidden email]> wrote:
Hi Pohl,

I tried to ship my property file. Example: -yarn.ship-files s3://applib/xx/xx/1.0-SNAPSHOT/application.properties  \

Error:

6:21:37.163 [main] ERROR org.apache.flink.client.cli.CliFrontend - Invalid command line arguments.
org.apache.flink.client.cli.CliArgsException: Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:244) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:223) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) [hadoop-common-2.10.0-amzn-0.jar:?]
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) [flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) [flink-dist_2.11-1.11.0.jar:1.11.0]
Caused by: java.io.FileNotFoundException: JAR file does not exist: -yarn.ship-files
        at org.apache.flink.client.cli.CliFrontend.getJarFile(CliFrontend.java:740) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:717) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:242) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        ... 8 more
Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files


Thanks,
Vijay

On Tue, May 25, 2021 at 11:58 PM Matthias Pohl <[hidden email]> wrote:
Hi Vijay,
have you tried yarn-ship-files [1] or yarn-ship-archives [2]? Maybe, that's what you're looking for...

Best,
Matthias


On Tue, May 25, 2021 at 5:56 PM Vijayendra Yadav <[hidden email]> wrote:
Hi Piotr,

I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process. 
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.

I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.

Thanks,
Vijay


On Tue, May 25, 2021 at 12:50 AM Piotr Nowojski <[hidden email]> wrote:
Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav <[hidden email]> napisał(a):
Hi Team,

I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific

Please let me know options.

Thanks,
Vijay