(DEPRECATED) Apache Flink User Mailing List archive.

Flink 1.1.3 RollingSink - understanding output blocks/parallelism

Classic

List

Threaded

2 messages Options

Dominik Safaric

Flink 1.1.3 RollingSink - understanding output blocks/parallelism

Hi everyone,

although this question might sound trivial, I’ve been curious about the following. Given a Flink topology with parallelism level set to 6 for example and outputting the data stream to HDFS using an instance RollingSink, how is the output file structured? By structured, I refer to the fact that this will result in 6 distinct block files, whereas I would like to have a single file containing all of the output values from the DataStream.

Regards,
Dominik

Aljoscha Krettek

Re: Flink 1.1.3 RollingSink - understanding output blocks/parallelism

Hi Dominik,

I think having a single output file is only possible if you set the parallelism of the sink to 1. AFAIK it is not possible to concurrently write to a single HDFS file from multiple clients.

Cheers,

Aljoscha

On Wed, 14 Dec 2016 at 20:57 Dominik Safaric <[hidden email]> wrote:

Hi everyone,

although this question might sound trivial, I’ve been curious about the following. Given a Flink topology with parallelism level set to 6 for example and outputting the data stream to HDFS using an instance RollingSink, how is the output file structured? By structured, I refer to the fact that this will result in 6 distinct block files, whereas I would like to have a single file containing all of the output values from the DataStream.

Regards,
Dominik