Hi,
I am trying to process data stored on HDFS using flink batch jobs. Our data is splitted into 16 data nodes. I am curious to know how data will be pulled from the data nodes with the same number of parallelism set as the data split on HDFS i.e. 16. Is the flink task being executed locally on the data node server or it will happen in the flink nodes where data will be pulled remotely? Any help will be appreciated. Regards, Pritam. |
Hi Pratam, Flink does not deploy tasks to certain nodes according to source data locations. Instead, it will let a task process local input splits (data on the same node) first. So if your parallelism is large enough to distribute on all the data nodes, most data can be processed locally. Thanks, Zhu Zhu Pritam Sadhukhan <[hidden email]> 于2019年10月18日周五 上午10:59写道:
|
Hi Zhu Zhu, Thanks for your detailed answer. Can you please help me to understand how flink task process the data locally on data nodes first? I want to understand how flink determines the processing to be done at the data nodes? Regards, Pritam. On Sat, 19 Oct 2019 at 08:16, Zhu Zhu <[hidden email]> wrote:
|
Sources of batch jobs process InputSplit. Each InputSplit can be a file or a file block according to the FileSystem(for HDFS it is file block). Sources need to retrieve InputSplits to process from InputSplitAssigner at JM. In this way, the assigning of InputSplit to source tasks are possible to take the InputSplit location and task location into consideration to support input locality. To enable this input locality support, it is required to use a InputFormat which leverages LocatableInputSplitAssigner and LocatableInputSplit, e.g. FileInputFormat, HadoopInputFormat, etc. The file reading source interfaces provided in ExecutionEnvironment, like #readTextFile and #readFile, use FileInputFormat, so the input locality is supported by default. Thanks, Zhu Zhu Pritam Sadhukhan <[hidden email]> 于2019年10月21日周一 上午10:17写道:
|
Thanks a lot Zhu Zhu for such an elaborated explanation. On Mon, 21 Oct 2019 at 08:33, Zhu Zhu <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |