(DEPRECATED) Apache Flink User Mailing List archive.

Does HadoopOutputFormat create MapReduce job internally?

Classic

List

Threaded

3 messages Options

morven huang

Does HadoopOutputFormat create MapReduce job internally?

Hi,

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with compression, and I find a way from the link https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html, the code works, but I’m curious to know, since it creates a mapreduce Job instance here, would this Flink application creates and run a mapreduce underneath? If so, will it kill performance?

I tried to figure out by looking into log, but couldn’t get a clue, hope people could shed some light here. Thank you.

Job job = Job.getInstance();

HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new HadoopOutputFormat<BytesWritable, BytesWritable>(

new SequenceFileAsBinaryOutputFormat(), job);

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString());

TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));

dataset.output(hadoopOF);

Fabian Hueske-2

Re: Does HadoopOutputFormat create MapReduce job internally?

Hi,

Flink's Hadoop compatibility functions just wrap functions that were implemented against Hadoop's interfaces in wrapper functions that are implemented against Flink's interfaces.

There is no Hadoop cluster started or MapReduce job being executed.

Job is just a class of the Hadoop API. It does not imply that a Hadoop job is executed.

Best,

Fabian

Am Mi., 10. Apr. 2019 um 15:12 Uhr schrieb Morven Huang <[hidden email]>:

Hi,

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with compression, and I find a way from the link https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html, the code works, but I’m curious to know, since it creates a mapreduce Job instance here, would this Flink application creates and run a mapreduce underneath? If so, will it kill performance?

I tried to figure out by looking into log, but couldn’t get a clue, hope people could shed some light here. Thank you.

Job job = Job.getInstance();

HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new HadoopOutputFormat<BytesWritable, BytesWritable>(

new SequenceFileAsBinaryOutputFormat(), job);

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString());

TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));

dataset.output(hadoopOF);

morven huang

Re: Does HadoopOutputFormat create MapReduce job internally?

Hi Fabian,

Thank you for the clarification.

Best,

Morven Huang

On Wed, Apr 10, 2019 at 9:57 PM Fabian Hueske <[hidden email]> wrote:

Hi,

Flink's Hadoop compatibility functions just wrap functions that were implemented against Hadoop's interfaces in wrapper functions that are implemented against Flink's interfaces.
There is no Hadoop cluster started or MapReduce job being executed.

Job is just a class of the Hadoop API. It does not imply that a Hadoop job is executed.

Best,
Fabian

Am Mi., 10. Apr. 2019 um 15:12 Uhr schrieb Morven Huang <[hidden email]>:
Hi,

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with compression, and I find a way from the link https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html, the code works, but I’m curious to know, since it creates a mapreduce Job instance here, would this Flink application creates and run a mapreduce underneath? If so, will it kill performance?

I tried to figure out by looking into log, but couldn’t get a clue, hope people could shed some light here. Thank you.

Job job = Job.getInstance();

HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new HadoopOutputFormat<BytesWritable, BytesWritable>(

new SequenceFileAsBinaryOutputFormat(), job);

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString());

TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));

dataset.output(hadoopOF);