Hi,
While this task is quite trivial to do with Flink Dataset API,
using readTextFile to read the input and
a flatMap function to perform the downloading, it might not be a
good idea.
The download process is I/O bound, and will block the synchronous
flatMap function,
so the throughput will not be very good.
Until Flink supports asynchronous functions, I suggest you looks
elsewhere.
An example with master-workers architecture using Akka can be
found here
https://github.com/typesafehub/activator-akka-distributed-workers
Regards,
Kien
Hi all,
I am fairly new to Flink. I have this project where I have a list of URLs (In one node) which need to be crawled distributedly. Then for each URL, I need the serialized crawled result to be written to a single text file.
I want to know if there are similar projects which I can look into or an idea on how to implement this.
Thanks & Regards,
Eranga Heshan Undergraduate Computer Science & Engineering University of Moratuwa Mobile: <a href="tel:%2B94%2071%20552%202087" value="+94715522087" style="color:rgb(17,85,204)" target="_blank" moz-do-not-send="true">+94 71 138 2686 Email: [hidden email] ![]()
![]()
Free forum by Nabble | Edit this page |