Hi guys,
What is the best way to process a file from a unix file system since there is no guarantee as to which task manager will be assigned to process the file. We run flink in standalone mode. We currently follow the brute force way in which we copy the file to every task manager, is there a better way to do this ? Best, Nick. |
Hi Nick, On a project I worked on, we simply made the file accessible on a shared NFS drive. Our source was custom, and we forced it to parallelism 1 inside the job, so the file wouldn't be read multiple times. The rest of the job was distributed. This was also on a standalone cluster. On a resource managed cluster I guess the resource manager could take care of copying the file for us. Hope this can help. If there would have been a better solution, I'm also happy to hear it :). Regards, Laurent. On Tue, Jun 23, 2020, 20:51 Nick Bendtner <[hidden email]> wrote:
♻ Be green, keep it on the screen |
Thanks that makes sense. On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens <[hidden email]> wrote:
|
Another option if the file is small enough is to load it in the driver and directly initialize an in-memory source (env.fromElements). On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara <[hidden email]> wrote:
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Free forum by Nabble | Edit this page |