Hello,
I was wondering whether anyone has tried and/or had any luck creating a custom catalog with Iceberg + Flink via the Python API (https://iceberg.apache.org/flink/#custom-catalog)? When doing so, the docs mention that dependencies need to be specified via pipeline.jars / pipeline.classpaths (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/python/dependency_management/). I'm providing the iceberg flink runtime + the hadoop libs via those and I can confirm that I'm landing in the FlinkCatalogFactory but then things fail because it doesn't see the hadoop dependencies for some reason. What would be the right way to provide the HADOOP_CLASSPATH when using the Python API? I have a minimal code example that shows the issue here: https://gist.github.com/nastra/92bc3bc7b7037d956aa5807988078b8d#file-flink-py-L38 Thanks |
Hi, 1) The cause of the exception: The dependencies added via pipeline.jars / pipeline.classpaths will be used to construct user class loader. For your job, the exception happens when HadoopUtils.getHadoopConfiguration is called. The reason is that HadoopUtils is provided by Flink which is loaded by the system classloader instead of user classloader. As the hadoop dependences are only available in the user classloader and so ClassNotFoundException was raised. So, the Hadoop dependencies are a little special than the other dependencies as they are also dependent by Flink, not only the user coder. 2) How to support HADOOP_CLASSPATH in local mode (e.g. execute in IDE) Currently, you have to copy the hadoop jars into the PyFlink installation directory: site-packages/pyflink/lib/ However, it makes sense to support HADOOP_CLASSPATH environment variable in local mode to avoid copy the jars manually. I will create a ticket for this. 3) How to support HADOOP_CLASSPATH in remote mode (e.g. submitted via `flink run`) You could export HADOOP_CLASSPATH=`hadoop classpath` manually before executing `flink run`. Regards, Dian
|
Hi Dian, thanks a lot for the explanation and help. Option 2) is what I needed and it works. Regards Eduard On Tue, May 18, 2021 at 6:21 AM Dian Fu <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |