(DEPRECATED) Apache Flink User Mailing List archive.

BucketingSink & StreamingFileSink

Classic

List

Threaded

3 messages Options

Mariano González Núñez

BucketingSink & StreamingFileSink

Hi Flink Team,

I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).

We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.

We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.

Thank you very much
Best Regards

rmetzger0

Re: BucketingSink & StreamingFileSink

Hi Mariano,

thanks a lot for your question. The resolution on StackOverflow seems to be that Azure Datalake is not yet (https://issues.apache.org/jira/browse/FLINK-18568) supported by the StreamingFileSink.

On Thu, Jul 30, 2020 at 5:34 PM Mariano González Núñez <[hidden email]> wrote:

Hi Flink Team,

I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).

We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.

We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.

Thank you very much
Best Regards

Mariano González Núñez

RE: BucketingSink & StreamingFileSink

Hi Robert,

Thanks for the answer...

De: Robert Metzger <[hidden email]>
Enviado: martes, 11 de agosto de 2020 3:46
Para: Mariano González Núñez <[hidden email]>
Cc: [hidden email] <[hidden email]>
Asunto: Re: BucketingSink & StreamingFileSink

Hi Mariano,

thanks a lot for your question. The resolution on StackOverflow seems to be that Azure Datalake is not yet (https://issues.apache.org/jira/browse/FLINK-18568) supported by the StreamingFileSink.

On Thu, Jul 30, 2020 at 5:34 PM Mariano González Núñez <[hidden email]> wrote:

Hi Flink Team,

I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).

We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.

We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.

Thank you very much
Best Regards