Recursive Traversal of the Input Path Directory, Not working

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)

val featuresSource: String = "file:///Users/adarsh/Documents/testData/featurecsv/31c710ac40/2017/06/22"

val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh

Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Stefan Richter
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh


Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_8394359328080938236mailtrack-img" style="float:right" alt="" src="">Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh



Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Stefan Richter
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_8394359328080938236mailtrack-img" style="float:right" alt="" src="">Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh




Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-103716746202477443mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh





Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Stefan Richter
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-103716746202477443mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh






Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_1627718801731494630mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh







Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Stefan Richter
No, that doesn’t make a difference and also works.

Am 23.06.2017 um 11:40 schrieb Adarsh Jain <[hidden email]>:

I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_1627718801731494630mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh








Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
Hi Stefan,

I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv".

While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files.

Can you suggest if we can do a setting so that it does not pre appends underscore while saving a file.

Regards,
Adarsh

On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <[hidden email]> wrote:
No, that doesn’t make a difference and also works.

Am 23.06.2017 um 11:40 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-6752740260910078225mailtrack-img" style="float:right" alt="" src="">I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh









Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Stefan Richter
Hi,

I suggest that you simply open an issue for this in our jira, describing the improvement idea. That should be the fastest way to get this changed.

Best,
Stefan

Am 23.06.2017 um 15:08 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv".

While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files.

Can you suggest if we can do a setting so that it does not pre appends underscore while saving a file.

Regards,
Adarsh

On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <[hidden email]> wrote:
No, that doesn’t make a difference and also works.

Am 23.06.2017 um 11:40 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-6752740260910078225mailtrack-img" style="float:right" alt="" src="">I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh










Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Adarsh Jain
Thanks Stefan, my colleague Shashank has filed a bug for the same in jira


Regards,
Adarsh

On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I suggest that you simply open an issue for this in our jira, describing the improvement idea. That should be the fastest way to get this changed.

Best,
Stefan

Am 23.06.2017 um 15:08 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-8136222731565930424mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv".

While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files.

Can you suggest if we can do a setting so that it does not pre appends underscore while saving a file.

Regards,
Adarsh

On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <[hidden email]> wrote:
No, that doesn’t make a difference and also works.

Am 23.06.2017 um 11:40 schrieb Adarsh Jain <[hidden email]>:

I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh











Reply | Threaded
Open this post in threaded view
|

Re: Recursive Traversal of the Input Path Directory, Not working

Aljoscha Krettek
Hi,

Hadoop FileInputFormats (by default) also include hidden files (files starting with “.” or “_”). You can override this behaviour in Flink by subclassing TextInputFormat and overriding the accept() method. You can use a custom input format with ExecutionEnvironment.readFile().

Regarding BucketingSink, you can change both the prefixes and suffixes of the various files using configuration methods.

Best,
Aljoscha

On 27. Jun 2017, at 11:53, Adarsh Jain <[hidden email]> wrote:

Thanks Stefan, my colleague Shashank has filed a bug for the same in jira


Regards,
Adarsh

On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I suggest that you simply open an issue for this in our jira, describing the improvement idea. That should be the fastest way to get this changed.

Best,
Stefan

Am 23.06.2017 um 15:08 schrieb Adarsh Jain <[hidden email]>:

<img width="0" height="0" class="m_-8136222731565930424mailtrack-img" style="float:right" alt="" src="">Hi Stefan,

I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv".

While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files.

Can you suggest if we can do a setting so that it does not pre appends underscore while saving a file.

Regards,
Adarsh

On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <[hidden email]> wrote:
No, that doesn’t make a difference and also works.

Am 23.06.2017 um 11:40 schrieb Adarsh Jain <[hidden email]>:

I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem?

With "import org.apache.flink.api.scala.ExecutionEnvironment"

Using scala in my program.

Regards,
Adarsh 

On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.

Am 23.06.2017 um 11:25 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Thanks for your efforts in checking the same, still doesn't work for me. 

Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same.

Thanks again.

Regards,
Adarsh


On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories.

Best,
Stefan

Am 22.06.2017 um 21:13 schrieb Adarsh Jain <[hidden email]>:

Hi Stefan,

Yes your understood right, when I give full path till the filename it works fine however when I give path till 
directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this.

Should be easily replicable, in case you can try. Will be really helpful.

Regards,
Adarsh

On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
Hi,

I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions?

Best,
Stefan

Am 22.06.2017 um 17:01 schrieb Adarsh Jain <[hidden email]>:

Hi,

I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

val config = new Configuration
    config.setBoolean("recursive.file.enumeration",true)


val testInput = env.readTextFile(featuresSource).withParameters(config)
testInput.print()

Please guide how to fix this.

Regards,
Adarsh