env.readFile with enumeratenestedFields

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

env.readFile with enumeratenestedFields

Flavio Pompermaier
Hi to all,

in my job I'm doing the following to recursively read the files inside a dir:

 TextInputFormat inputFormat = new TextInputFormat(new Path(inputDir));
    org.apache.flink.configuration.Configuration ifConf =
        new org.apache.flink.configuration.Configuration();
    ifConf.setBoolean(FileInputFormat.ENUMERATE_NESTED_FILES_FLAG, true);
    inputFormat.configure(ifConf);
   
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
     env.readFile(inputFormat, inputDir).print();

While inputFormat.configure() sets correctly the enumeratenestedFields field within TextInputFormat, the execution of the job seems to forget this parameter and reset it to false.

Am I doing something wrong or there's a bug here (I'm using Flink 1.0.2)?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: env.readFile with enumeratenestedFields

Aljoscha Krettek
Hi,
the configuration has to be passed using env.readFile(...).withParameters(ifConf). The InputFormat will then be properly configured at runtime.

However, Kostas just enhanced the FileInputFormats to allow setting the parameters directly on the input format. In 1.1-SNAPSHOT and the upcoming 1.1 you should be able to use inputFormat.setNestedFileEnumeration(true).

Cheers,
Aljoscha

On Wed, 20 Jul 2016 at 17:55 Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

in my job I'm doing the following to recursively read the files inside a dir:

 TextInputFormat inputFormat = new TextInputFormat(new Path(inputDir));
    org.apache.flink.configuration.Configuration ifConf =
        new org.apache.flink.configuration.Configuration();
    ifConf.setBoolean(FileInputFormat.ENUMERATE_NESTED_FILES_FLAG, true);
    inputFormat.configure(ifConf);
   
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
     env.readFile(inputFormat, inputDir).print();

While inputFormat.configure() sets correctly the enumeratenestedFields field within TextInputFormat, the execution of the job seems to forget this parameter and reset it to false.

Am I doing something wrong or there's a bug here (I'm using Flink 1.0.2)?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: env.readFile with enumeratenestedFields

Kostas Kloudas
Hi Flavio,

As Aljoscha pointed out the problem must be solved now.
The changes are already in the master.
If there is any issue let us know.

Kostas

On Jul 20, 2016, at 6:29 PM, Aljoscha Krettek <[hidden email]> wrote:

Hi,
the configuration has to be passed using env.readFile(...).withParameters(ifConf). The InputFormat will then be properly configured at runtime.

However, Kostas just enhanced the FileInputFormats to allow setting the parameters directly on the input format. In 1.1-SNAPSHOT and the upcoming 1.1 you should be able to use inputFormat.setNestedFileEnumeration(true).

Cheers,
Aljoscha

On Wed, 20 Jul 2016 at 17:55 Flavio Pompermaier <[hidden email]> wrote:
Hi to all,




in my job I'm doing the following to recursively read the files inside a dir:

 TextInputFormat inputFormat = new TextInputFormat(new Path(inputDir));
    org.apache.flink.configuration.Configuration ifConf =
        new org.apache.flink.configuration.Configuration();
    ifConf.setBoolean(FileInputFormat.ENUMERATE_NESTED_FILES_FLAG, true);
    inputFormat.configure(ifConf);
   
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
     env.readFile(inputFormat, inputDir).print();

While inputFormat.configure() sets correctly the enumeratenestedFields field within TextInputFormat, the execution of the job seems to forget this parameter and reset it to false.

Am I doing something wrong or there's a bug here (I'm using Flink 1.0.2)?

Best,
Flavio