Does HadoopOutputFormat create MapReduce job internally?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Does HadoopOutputFormat create MapReduce job internally?

morven huang

Hi,

 

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with compression, and I find a way from the link https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html, the code works, but I’m curious to know, since it creates a mapreduce Job instance here, would this Flink application creates and run a mapreduce underneath? If so, will it kill performance?

 

I tried to figure out by looking into log, but couldn’t get a clue, hope people could shed some light here. Thank you.

 

Job job = Job.getInstance();

HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new HadoopOutputFormat<BytesWritable, BytesWritable>(

                    new SequenceFileAsBinaryOutputFormat(), job);

 

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString());

TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));

dataset.output(hadoopOF);

 

Sent from Mail for Windows 10

 

Reply | Threaded
Open this post in threaded view
|

Re: Does HadoopOutputFormat create MapReduce job internally?

Fabian Hueske-2
Hi Morven,

You posted the same question a few days ago and it was also answered correctly. 
Please do not repost the same question again.
You can reply to the earlier thread if you have a follow up question.

To answer your question briefly:
No, Flink does not trigger a MapReduce job. 
The whole job is executed in Flink. 

Fabian 


morven huang <[hidden email]> schrieb am Sa., 13. Apr. 2019, 14:37:

Hi,

 

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with compression, and I find a way from the link https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html, the code works, but I’m curious to know, since it creates a mapreduce Job instance here, would this Flink application creates and run a mapreduce underneath? If so, will it kill performance?

 

I tried to figure out by looking into log, but couldn’t get a clue, hope people could shed some light here. Thank you.

 

Job job = Job.getInstance();

HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new HadoopOutputFormat<BytesWritable, BytesWritable>(

                    new SequenceFileAsBinaryOutputFormat(), job);

 

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString());

TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));

dataset.output(hadoopOF);

 

Sent from Mail for Windows 10