BucketingSink & StreamingFileSink

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

BucketingSink & StreamingFileSink

Mariano González Núñez
Hi Flink Team,
I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).
We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.
We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.


Thank you very much
Best Regards

Reply | Threaded
Open this post in threaded view
|

Re: BucketingSink & StreamingFileSink

rmetzger0
Hi Mariano,

thanks a lot for your question. The resolution on StackOverflow seems to be that Azure Datalake is not yet (https://issues.apache.org/jira/browse/FLINK-18568) supported by the StreamingFileSink.

On Thu, Jul 30, 2020 at 5:34 PM Mariano González Núñez <[hidden email]> wrote:
Hi Flink Team,
I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).
We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.
We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.


Thank you very much
Best Regards

Reply | Threaded
Open this post in threaded view
|

RE: BucketingSink & StreamingFileSink

Mariano González Núñez
Hi Robert,
Thanks for the answer... 




De: Robert Metzger <[hidden email]>
Enviado: martes, 11 de agosto de 2020 3:46
Para: Mariano González Núñez <[hidden email]>
Cc: [hidden email] <[hidden email]>
Asunto: Re: BucketingSink & StreamingFileSink
 
Hi Mariano,

thanks a lot for your question. The resolution on StackOverflow seems to be that Azure Datalake is not yet (https://issues.apache.org/jira/browse/FLINK-18568) supported by the StreamingFileSink.

On Thu, Jul 30, 2020 at 5:34 PM Mariano González Núñez <[hidden email]> wrote:
Hi Flink Team,
I'm Mariano & I'm working with Apache Flink to process data and sink from Kafka to Azure Datalake (ADLS Gen1).
We are having problems with the sink in parquet format in the ADLS Gen1, also don't work with the gen2.
We try to do the StreamingFileSink to store in parquet but we can't sink in adls because the hadoop doesn't work fine with the library and adls prefix. (HDFS problem... https://stackoverflow.com/questions/62884450/problem-with-flink-streamingfilesinkgenericrecord-azure-datalake-gen-2)

We change and use the deprecated BucketingSink (Works for adls without .setWriter) but we can't sink in Parquet format using the .setWriter.

You have some suggestion for do the sink to ADLS Gen1 or Gen2 or you have a new feature in the future to use the FileStreaming.


Thank you very much
Best Regards