Hello, We have our flink job (1.8.0) running on our hadoop 2.7 cluster with yarn. We would like to add the GCS connector to use GCS rather than HDFS. Following the documentation of the GCS connector[1] we have to specify which credentials we want to use and there are two ways of doing this: * Edit core-site.xml * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS Because we're on a company shared hadoop cluster we do not want to change the cluster wide core-site.xml. This leaves me with two options: 1. Create a custom core-site.xml and use --yarnship to send it to all the taskmanager contains. If I do this, to what value should I set fs.hdfs.hadoopconf[2] in flink-conf ? 2. The second option would be to set an environment variable, however because the taskmanagers are started via yarn I'm having trouble figuring out how to make sure this environment variable is set for each yarn container / taskmanager. I would appreciate any help you can provide. Thank you, Richard |
Hi Richard, You can use dynamic properties to add your environmental variables. Set jobmanager env: e.g. -Dcontainerized.master.env.GOOGLE_APPLICATION_CREDENTIALS=xyz Set taskmanager env: e.g. -Dcontainerized.taskmanager.env.GOOGLE_APPLICATION_CREDENTIALS=xyz Best Regards, Jiayi Liao Original Message Sender: Richard Deurwaarder<[hidden email]> Recipient: user<[hidden email]> Date: Tuesday, Sep 24, 2019 23:01 Subject: Setting environment variables of the taskmanagers (yarn) Hello, We have our flink job (1.8.0) running on our hadoop 2.7 cluster with yarn. We would like to add the GCS connector to use GCS rather than HDFS. Following the documentation of the GCS connector[1] we have to specify which credentials we want to use and there are two ways of doing this: * Edit core-site.xml * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS Because we're on a company shared hadoop cluster we do not want to change the cluster wide core-site.xml. This leaves me with two options: 1. Create a custom core-site.xml and use --yarnship to send it to all the taskmanager contains. If I do this, to what value should I set fs.hdfs.hadoopconf[2] in flink-conf ? 2. The second option would be to set an environment variable, however because the taskmanagers are started via yarn I'm having trouble figuring out how to make sure this environment variable is set for each yarn container / taskmanager. I would appreciate any help you can provide. Thank you, Richard |
In reply to this post by Richard Deurwaarder
Hi Richard, For the first question, I don't think you need to explicitly specify fs.hdfs.hadoopconf as each file in the ship folder is copied as a yarn local resource for containers. The configuration path is overridden internally in Flink. For the second question of setting TM environment variables, please use these two configurations in your flink conf. /** Best Regards Peter Huang On Tue, Sep 24, 2019 at 8:02 AM Richard Deurwaarder <[hidden email]> wrote:
|
Hi Peter and Jiayi, Thanks for the answers this worked perfectly, I just added containerized.master.env.GOOGLE_APPLICATION_CREDENTIALS=xyz and containerized.taskmanager.env.GOOGLE_APPLICATION_CREDENTIALS=xyz to my flink config and they got picked up. Do you know why this is missing from the docs? If it's not intentional it might be nice to add it. Richard On Tue, Sep 24, 2019 at 5:53 PM Peter Huang <[hidden email]> wrote:
|
Hi Richard, Good suggestion. I just created a Jira ticket. I will find a time this week to update docs. Best Regards Peter Huang On Wed, Sep 25, 2019 at 8:05 AM Richard Deurwaarder <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |