Is there a way to avoid manual upload hive-udf's resources when we submit a job?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to avoid manual upload hive-udf's resources when we submit a job?

Husky Zeng
This post was updated on .
When we submit a job which use udf of hive , the job will dependent on udf's
jars and configuration files.

We have already store udf's jars and configuration files in hive metadata
store,so we excpet that flink could get those files hdfs paths by
hive-connector,and get those files in hdfs by paths when it running.

In this code, it seemed we have already get those udf resources's path in
FunctionInfo, but did't use it.

 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80

Now,we maintain the same data as hive-metastore in flink-client.It is a big trouble to sync files by manual.


So we try to find a way to avoid manual submit udf's resources when we submit a
job.Is it possible?  



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

Timo Walther
Hi Husky,

I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is
needed to make this feature possible.

@Rui: Do you know more about this issue and current limitations.

Regards,
Timo


On 18.09.20 09:11, Husky Zeng wrote:

> When we submit a job which use udf of hive , the job will dependent on udf's
> jars and configuration files.
>
> We have already store udf's jars and configuration files in hive metadata
> store,so we excpet that flink could get those files hdfs paths by
> hive-connector,and get those files in hdfs by paths when it running.
>
> In this code, it seemed we have already get those udf resources's path in
> FunctionInfo, but did't use it.
>
>    
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
>
> We submit udf's  jars and configuration files with job to yarn by client now
> ,and try to find a way to avoid submit udf's resources when we submit a
> job.Is it possible?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

Husky Zeng
Hi Timo,

Thanks for your attention,As what I say in this comment, this feature can
surely solve our problem, but it seems that the workload is much larger than
the solution in my scenario. Our project urgently needs to solve the problem
of reusing hive UDF in hive metastore, so we are more inclined to develop a
fast solution. I want to hear some community advice.

https://issues.apache.org/jira/browse/FLINK-19335?focusedCommentId=17199927&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17199927

Best Regards,
Husky Zeng



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

Rui Li-2
In reply to this post by Timo Walther
Hi Timo,

I believe the blocker for this feature is that we don't support dynamically adding user jars/resources at the moment. We're able to read the path to the function jar from Hive metastore, but we cannot load the jar after the user session is started.

On Tue, Sep 22, 2020 at 3:43 PM Timo Walther <[hidden email]> wrote:
Hi Husky,

I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is
needed to make this feature possible.

@Rui: Do you know more about this issue and current limitations.

Regards,
Timo


On 18.09.20 09:11, Husky Zeng wrote:
> When we submit a job which use udf of hive , the job will dependent on udf's
> jars and configuration files.
>
> We have already store udf's jars and configuration files in hive metadata
> store,so we excpet that flink could get those files hdfs paths by
> hive-connector,and get those files in hdfs by paths when it running.
>
> In this code, it seemed we have already get those udf resources's path in
> FunctionInfo, but did't use it.
>
>   
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
>
> We submit udf's  jars and configuration files with job to yarn by client now
> ,and try to find a way to avoid submit udf's resources when we submit a
> job.Is it possible?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>



--
Cheers,
Rui Li