Hi,
I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.api.java.{DataSet, ExecutionEnvironment} import org.apache.flink.configuration.Configuration val config = new Configuration config.setBoolean("recursive.file.enumeration",true) val featuresSource: String = "file:///Users/adarsh/Documents/testData/featurecsv/31c710ac40/2017/06/22" val testInput = env.readTextFile(featuresSource).withParameters(config) testInput.print() Please guide how to fix this. Regards, Adarsh |
Hi,
I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions? Best, Stefan
|
Hi Stefan, Yes your understood right, when I give full path till the filename it works fine however when I give path till directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this. Should be easily replicable, in case you can try. Will be really helpful. Regards, Adarsh On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <[hidden email]> wrote:
|
Hi,
I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories. Best, Stefan
|
Hi Stefan, Thanks for your efforts in checking the same, still doesn't work for me. Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same. Thanks again. Regards, Adarsh On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <[hidden email]> wrote:
|
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes.
|
I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem? With "import org.apache.flink.api.scala.ExecutionEnvironment" Using scala in my program. Regards, Adarsh On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <[hidden email]> wrote:
|
No, that doesn’t make a difference and also works.
|
Hi Stefan, I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv". While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files. Can you suggest if we can do a setting so that it does not pre appends underscore while saving a file. Regards, Adarsh On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <[hidden email]> wrote:
|
Hi,
I suggest that you simply open an issue for this in our jira, describing the improvement idea. That should be the fastest way to get this changed. Best, Stefan
|
Thanks Stefan, my colleague Shashank has filed a bug for the same in jira Regards, Adarsh On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter <[hidden email]> wrote:
|
Hi,
Hadoop FileInputFormats (by default) also include hidden files (files starting with “.” or “_”). You can override this behaviour in Flink by subclassing TextInputFormat and overriding the accept() method. You can use a custom input format with ExecutionEnvironment.readFile(). Regarding BucketingSink, you can change both the prefixes and suffixes of the various files using configuration methods. Best, Aljoscha
|
Free forum by Nabble | Edit this page |