Flink Hadoop config on docker-compose

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Hadoop config on docker-compose

Flavio Pompermaier
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

rmetzger0
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

Flavio Pompermaier
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <[hidden email]> ha scritto:
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

Yang Wang
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting the hadoop configuration from classpath.


Best,
Yang

Best,
Yang

Flavio Pompermaier <[hidden email]> 于2021年4月16日周五 上午3:55写道:
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <[hidden email]> ha scritto:
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

Flavio Pompermaier
Hi Yang,
isn't this something to fix? If I look at the documentation at  [1], in the "Passing configuration via environment variables" section, there is: 

"The environment variable FLINK_PROPERTIES should contain a list of Flink cluster configuration options separated by new line,
the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence over configurations in flink-conf.yaml."

To me this means that if I specify "env.hadoop.conf.dir" it should be handled as well. Am I wrong?


On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <[hidden email]> wrote:
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting the hadoop configuration from classpath.


Best,
Yang

Best,
Yang

Flavio Pompermaier <[hidden email]> 于2021年4月16日周五 上午3:55写道:
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <[hidden email]> ha scritto:
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

Matthias
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks for bringing it up.


On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier <[hidden email]> wrote:
Hi Yang,
isn't this something to fix? If I look at the documentation at  [1], in the "Passing configuration via environment variables" section, there is: 

"The environment variable FLINK_PROPERTIES should contain a list of Flink cluster configuration options separated by new line,
the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence over configurations in flink-conf.yaml."

To me this means that if I specify "env.hadoop.conf.dir" it should be handled as well. Am I wrong?


On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <[hidden email]> wrote:
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting the hadoop configuration from classpath.


Best,
Yang

Best,
Yang

Flavio Pompermaier <[hidden email]> 于2021年4月16日周五 上午3:55写道:
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <[hidden email]> ha scritto:
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Flink Hadoop config on docker-compose

Flavio Pompermaier
Great! Thanks for the support

On Thu, Apr 22, 2021 at 2:57 PM Matthias Pohl <[hidden email]> wrote:
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks for bringing it up.


On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier <[hidden email]> wrote:
Hi Yang,
isn't this something to fix? If I look at the documentation at  [1], in the "Passing configuration via environment variables" section, there is: 

"The environment variable FLINK_PROPERTIES should contain a list of Flink cluster configuration options separated by new line,
the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence over configurations in flink-conf.yaml."

To me this means that if I specify "env.hadoop.conf.dir" it should be handled as well. Am I wrong?


On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <[hidden email]> wrote:
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting the hadoop configuration from classpath.


Best,
Yang

Best,
Yang

Flavio Pompermaier <[hidden email]> 于2021年4月16日周五 上午3:55写道:
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <[hidden email]> ha scritto:
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.

Here's my code:


Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <[hidden email]> wrote:
Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content of HADOOP_CLASSPATH content because is too long and I didn't report the taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio