[DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Israel Ekpo
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Israel Ekpo
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Guowei Ma
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Israel Ekpo
Guowei,

What do we need to do to add support for it?

How do I get started on that?



On Wed, May 27, 2020 at 8:53 PM Guowei Ma <[hidden email]> wrote:
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Till Rohrmann
Hi Israel,

thanks for reaching out to the Flink community. As Guowei said, the StreamingFileSink can currently only recover from faults if it writes to HDFS or S3. Other file systems are currently not supported if you need fault tolerance.

Maybe Klou can tell you more about the background and what is needed to make it work with other file systems. He is one of the original authors of the StreamingFileSink.

Cheers,
Till

On Thu, May 28, 2020 at 4:39 PM Israel Ekpo <[hidden email]> wrote:
Guowei,

What do we need to do to add support for it?

How do I get started on that?



On Wed, May 27, 2020 at 8:53 PM Guowei Ma <[hidden email]> wrote:
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Israel Ekpo
Hi Till,

Thanks for your feedback and guidance.

It seems similar work was done for S3 filesystem where relocations were removed for those file system plugins.


It appears the same needs to be done for Azure File systems. 

I will attempt to connect with Klou today to collaborate to see what the level of effort is to add this support.

Thanks.



On Thu, May 28, 2020 at 11:54 AM Till Rohrmann <[hidden email]> wrote:
Hi Israel,

thanks for reaching out to the Flink community. As Guowei said, the StreamingFileSink can currently only recover from faults if it writes to HDFS or S3. Other file systems are currently not supported if you need fault tolerance.

Maybe Klou can tell you more about the background and what is needed to make it work with other file systems. He is one of the original authors of the StreamingFileSink.

Cheers,
Till

On Thu, May 28, 2020 at 4:39 PM Israel Ekpo <[hidden email]> wrote:
Guowei,

What do we need to do to add support for it?

How do I get started on that?



On Wed, May 27, 2020 at 8:53 PM Guowei Ma <[hidden email]> wrote:
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Till Rohrmann
I think what needs to be done is to implement a org.apache.flink.core.fs.RecoverableWriter for the respective file system. Similar to HadoopRecoverableWriter and S3RecoverableWriter.

Cheers,
Till

On Thu, May 28, 2020 at 6:00 PM Israel Ekpo <[hidden email]> wrote:
Hi Till,

Thanks for your feedback and guidance.

It seems similar work was done for S3 filesystem where relocations were removed for those file system plugins.


It appears the same needs to be done for Azure File systems. 

I will attempt to connect with Klou today to collaborate to see what the level of effort is to add this support.

Thanks.



On Thu, May 28, 2020 at 11:54 AM Till Rohrmann <[hidden email]> wrote:
Hi Israel,

thanks for reaching out to the Flink community. As Guowei said, the StreamingFileSink can currently only recover from faults if it writes to HDFS or S3. Other file systems are currently not supported if you need fault tolerance.

Maybe Klou can tell you more about the background and what is needed to make it work with other file systems. He is one of the original authors of the StreamingFileSink.

Cheers,
Till

On Thu, May 28, 2020 at 4:39 PM Israel Ekpo <[hidden email]> wrote:
Guowei,

What do we need to do to add support for it?

How do I get started on that?



On Wed, May 27, 2020 at 8:53 PM Guowei Ma <[hidden email]> wrote:
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

Israel Ekpo
Thanks Till. 

I will take a look at that tomorrow and let you know if I hit any roadblocks.

On Thu, May 28, 2020 at 12:11 PM Till Rohrmann <[hidden email]> wrote:
I think what needs to be done is to implement a org.apache.flink.core.fs.RecoverableWriter for the respective file system. Similar to HadoopRecoverableWriter and S3RecoverableWriter.

Cheers,
Till

On Thu, May 28, 2020 at 6:00 PM Israel Ekpo <[hidden email]> wrote:
Hi Till,

Thanks for your feedback and guidance.

It seems similar work was done for S3 filesystem where relocations were removed for those file system plugins.


It appears the same needs to be done for Azure File systems. 

I will attempt to connect with Klou today to collaborate to see what the level of effort is to add this support.

Thanks.



On Thu, May 28, 2020 at 11:54 AM Till Rohrmann <[hidden email]> wrote:
Hi Israel,

thanks for reaching out to the Flink community. As Guowei said, the StreamingFileSink can currently only recover from faults if it writes to HDFS or S3. Other file systems are currently not supported if you need fault tolerance.

Maybe Klou can tell you more about the background and what is needed to make it work with other file systems. He is one of the original authors of the StreamingFileSink.

Cheers,
Till

On Thu, May 28, 2020 at 4:39 PM Israel Ekpo <[hidden email]> wrote:
Guowei,

What do we need to do to add support for it?

How do I get started on that?



On Wed, May 27, 2020 at 8:53 PM Guowei Ma <[hidden email]> wrote:
Hi,
I think the StreamingFileSink could not support Azure currently. 
You could find more detailed info from here[1].


Israel Ekpo <[hidden email]> 于2020年5月28日周四 上午6:04写道:
You can assign the task to me and I will like to collaborate with someone to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo <[hidden email]> wrote:
Some users are running into issues when using Azure Blob Storage for the StreamFileSink


The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm 

Thanks.