(DEPRECATED) Apache Flink User Mailing List archive.

Is there a way to avoid manual upload hive-udf's resources when we submit a job？

Classic

List

Threaded

4 messages Options

Husky Zeng

Is there a way to avoid manual upload hive-udf's resources when we submit a job？

This post was updated on .

When we submit a job which use udf of hive , the job will dependent on udf's
jars and configuration files.

We have already store udf's jars and configuration files in hive metadata
store，so we excpet that flink could get those files hdfs paths by
hive-connector,and get those files in hdfs by paths when it running.

In this code, it seemed we have already get those udf resources's path in
FunctionInfo, but did't use it.

https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80

Now，we maintain the same data as hive-metastore in flink-client.It is a big trouble to sync files by manual.

So we try to find a way to avoid manual submit udf's resources when we submit a
job.Is it possible？

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Timo Walther

Re: Is there a way to avoid submit hive-udf's resources when we submit a job？

Hi Husky,

I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is
needed to make this feature possible.

@Rui: Do you know more about this issue and current limitations.

Regards,
Timo

On 18.09.20 09:11, Husky Zeng wrote:

> When we submit a job which use udf of hive , the job will dependent on udf's
> jars and configuration files.
>
> We have already store udf's jars and configuration files in hive metadata
> store，so we excpet that flink could get those files hdfs paths by
> hive-connector,and get those files in hdfs by paths when it running.
>
> In this code, it seemed we have already get those udf resources's path in
> FunctionInfo, but did't use it.
>
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
>
> We submit udf's jars and configuration files with job to yarn by client now
> ,and try to find a way to avoid submit udf's resources when we submit a
> job.Is it possible？
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Husky Zeng

Re: Is there a way to avoid submit hive-udf's resources when we submit a job？

Hi Timo,

Thanks for your attention，As what I say in this comment, this feature can
surely solve our problem, but it seems that the workload is much larger than
the solution in my scenario. Our project urgently needs to solve the problem
of reusing hive UDF in hive metastore, so we are more inclined to develop a
fast solution. I want to hear some community advice.

https://issues.apache.org/jira/browse/FLINK-19335?focusedCommentId=17199927&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17199927

Best Regards,
Husky Zeng

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Rui Li-2

Re: Is there a way to avoid submit hive-udf's resources when we submit a job？

In reply to this post by Timo Walther

Hi Timo,

I believe the blocker for this feature is that we don't support dynamically adding user jars/resources at the moment. We're able to read the path to the function jar from Hive metastore, but we cannot load the jar after the user session is started.

On Tue, Sep 22, 2020 at 3:43 PM Timo Walther <[hidden email]> wrote:

Hi Husky,

I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is
needed to make this feature possible.

@Rui: Do you know more about this issue and current limitations.

Regards,
Timo

On 18.09.20 09:11, Husky Zeng wrote:
> When we submit a job which use udf of hive , the job will dependent on udf's
> jars and configuration files.
>
> We have already store udf's jars and configuration files in hive metadata
> store，so we excpet that flink could get those files hdfs paths by
> hive-connector,and get those files in hdfs by paths when it running.
>
> In this code, it seemed we have already get those udf resources's path in
> FunctionInfo, but did't use it.
>
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
>
> We submit udf's jars and configuration files with job to yarn by client now
> ,and try to find a way to avoid submit udf's resources when we submit a
> job.Is it possible？
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Cheers,

Rui Li