I’m attempting to write to Azure Blob Storage using Flink's FileOutputFormat. I’ve included hadoop-azure within the jar I submit to Flink and configured
the paths to be prefixed with wasb://{CONTAINERNAME}@{ACCOUNTNAME}.blob.core.windows.net/.
When the file output format initializes, I get the following error:
ERROR ROOT - Run 4bfb099a-8d07-11e7-8d3a-fb4d07562cc0 failed with error: 'org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (/out/data)': No file system
found with scheme wasb, referenced in file URI '<a href="wasb://blob@" class="">wasb://blob@{ACCOUNTNAME}.blob.core.windows.net/out/data’.
Can I register the format programmatically from within the job (without putting credentials into a core-site.xml file on the task manager)? Can I still use Flink’s FileOutputFormat or should I be using a Hadoop OutputFormat?
Thanks,
Joshua
|
Was hadoop-azure jar on the classpath ? Please also see the following from https://hadoop.apache.org/docs/current/hadoop-azure/index.html : The built jar file, named hadoop-azure.jar, also declares transitive dependencies on the additional artifacts it requires, notably the Azure Storage SDK for Java. On Tue, Aug 29, 2017 at 3:24 PM, Joshua Griffith <[hidden email]> wrote:
|
Yes, hadoop-azure and azure-storage are both on the classpath. hadoop-azure is declared as a dependency in my build.sbt file and I’m using assembly to copy all of the dependencies into a single jar which is submitted to Flink. I suspect the wasb
format needs to be explicitly registered with Hadoop. I think that’s accomplished by inserting the following into core-site.xml (I’m not that familiar with Hadoop):
However, I’m wondering if it’s possible to achieve the same result from within the job since it’s difficult to modify files on the task manager in our configuration.
|
There is HADOOP-14753 which is still Open. FYI On Tue, Aug 29, 2017 at 3:41 PM, Joshua Griffith <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |