After searching on the internet I still do not find the answer (with key word like 'apache flink parallel read text') I am looking for. So asking here before jumping to write code ...
My problem is I want to a read text file or split text files (from local file system). Therefore I want to parallel read those files and process them accordingly. From what I discover so far: - Use ExecutionEnvironment.readTextFile but this only serves with 1 thread(?) (meaning reading the file(s) from the beginning to the end) - Use streaming env to addSource[1] but that seems to me I need to implement my own source with RichParallelSourceFunction. Is there any classes or impl that already can read text in parallel? Thanks |
ExecutionEnvironment.readTextFile will
read the file in parallel.
On 28.05.2016 09:59, David Olsen wrote:
|
Thank you for the advice! Now I have a new question. I read the source[1] streaming env exploits FileSourceFunction, which inherits RichParallelSourceFunction, to create split input[2]. I know I can set parallelism in streaming env, but any way I can verify that at runtime the split files or the file is read in parallel? Thank you again for your help. On 28 May 2016 at 17:52, Chesnay Schepler <[hidden email]> wrote:
|
Hi David, I guess you can verify it by adding custom log statements into the Flink code (therefore, you need to recompile Flink). Maybe a debugger is also sufficient (if you are running Flink locally). We are currently reworking the reading of static files for the streaming environment. Maybe its interesting to check out the new implementation [1] On Sat, May 28, 2016 at 1:49 PM, David Olsen <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |