Re: Distribute DataSet to subset of nodes
Posted by
Stefan Bunk on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Distribute-DataSet-to-subset-of-nodes-tp2814p2842.html
Hi Fabian,
I think we might have a misunderstanding here. I have already copied the first file to five nodes, and the second file to five other nodes, outside of Flink. In the open() method of the operator, I just read that file via normal Java means. I do not see, why this is tricky or how HDFS should help here.
Then, I have a normal Flink DataSet, which I want to run through the operator (using the previously read data in the flatMap implementation). As I run the program several times, I do not want to broadcast the data every time, but rather just copy it on the nodes, and be fine with it.
Can you answer my question from above? If the setParallelism-method works and selects five nodes for the first flatMap and five _other_ nodes for the second flatMap, then that would be fine for me if there is no other easy solution.
Thanks for your help!
Best
Stefan