Need for user class path accessibility on all nodes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Need for user class path accessibility on all nodes

Abdul Qadeer
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.

Reply | Threaded
Open this post in threaded view
|

Re: Need for user class path accessibility on all nodes

Biao Liu
Hi Abdul, "--classpath <url>" can be used for those are not included in user jar. If all your classes are included in your jar passed to Flink, you don't need this "--classpath".

Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午3:08写道:
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.

Reply | Threaded
Open this post in threaded view
|

Re: Need for user class path accessibility on all nodes

Abdul Qadeer
Hi Biao,

I am aware of it - that's not my question.

On Mon, Jun 17, 2019 at 7:42 PM Biao Liu <[hidden email]> wrote:
Hi Abdul, "--classpath <url>" can be used for those are not included in user jar. If all your classes are included in your jar passed to Flink, you don't need this "--classpath".

Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午3:08写道:
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.

Reply | Threaded
Open this post in threaded view
|

Re: Need for user class path accessibility on all nodes

Biao Liu
Ah, sorry for misunderstanding.
So what you are asking is that why we need "--classpath"? I'm not sure what the original author think of it. I guess the listed below might be considered.
1. Avoid duplicated deploying. If some common jars are deployed in advance to each node of cluster, the jobs depend on these jars could avoid deploying one by one.
2. Support NFS which is mentioned in option description of "--classpath".


Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午11:45写道:
Hi Biao,

I am aware of it - that's not my question.

On Mon, Jun 17, 2019 at 7:42 PM Biao Liu <[hidden email]> wrote:
Hi Abdul, "--classpath <url>" can be used for those are not included in user jar. If all your classes are included in your jar passed to Flink, you don't need this "--classpath".

Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午3:08写道:
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.

Reply | Threaded
Open this post in threaded view
|

Re: Need for user class path accessibility on all nodes

Till Rohrmann
Hi Abdul,

as Biao said the `--classpath` option should only be used if you want to make dependencies available which are not included in the submitted user code jar. E.g. if you have installed a large library which is too costly to ship every time you submit a job. Usually, you would not need to specify this option if you build an uber jar.

Cheers,
Till

On Tue, Jun 18, 2019 at 7:23 AM Biao Liu <[hidden email]> wrote:
Ah, sorry for misunderstanding.
So what you are asking is that why we need "--classpath"? I'm not sure what the original author think of it. I guess the listed below might be considered.
1. Avoid duplicated deploying. If some common jars are deployed in advance to each node of cluster, the jobs depend on these jars could avoid deploying one by one.
2. Support NFS which is mentioned in option description of "--classpath".


Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午11:45写道:
Hi Biao,

I am aware of it - that's not my question.

On Mon, Jun 17, 2019 at 7:42 PM Biao Liu <[hidden email]> wrote:
Hi Abdul, "--classpath <url>" can be used for those are not included in user jar. If all your classes are included in your jar passed to Flink, you don't need this "--classpath".

Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午3:08写道:
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.

Reply | Threaded
Open this post in threaded view
|

Re: Need for user class path accessibility on all nodes

Abdul Qadeer
Thanks Biao/Till, that answers my question.


On Tue, 18 Jun 2019 at 01:41, Till Rohrmann <[hidden email]> wrote:
Hi Abdul,

as Biao said the `--classpath` option should only be used if you want to make dependencies available which are not included in the submitted user code jar. E.g. if you have installed a large library which is too costly to ship every time you submit a job. Usually, you would not need to specify this option if you build an uber jar.

Cheers,
Till

On Tue, Jun 18, 2019 at 7:23 AM Biao Liu <[hidden email]> wrote:
Ah, sorry for misunderstanding.
So what you are asking is that why we need "--classpath"? I'm not sure what the original author think of it. I guess the listed below might be considered.
1. Avoid duplicated deploying. If some common jars are deployed in advance to each node of cluster, the jobs depend on these jars could avoid deploying one by one.
2. Support NFS which is mentioned in option description of "--classpath".


Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午11:45写道:
Hi Biao,

I am aware of it - that's not my question.

On Mon, Jun 17, 2019 at 7:42 PM Biao Liu <[hidden email]> wrote:
Hi Abdul, "--classpath <url>" can be used for those are not included in user jar. If all your classes are included in your jar passed to Flink, you don't need this "--classpath".

Abdul Qadeer <[hidden email]> 于2019年6月18日周二 上午3:08写道:
Hi!

I was going through submission of a Flink program through CLI. I see that "--classpath <url>" needs to be accessible from all nodes in the cluster as per documentation. As I understand the jar files are already part of the blob uploaded to JobManager from the CLI. The TaskManagers can download this blob when the receive the task and access the classes from there. Why is there a need to be able to access these files from every node then? It makes sense to use Distributed File System to access these jars if the network is not reachable to download blob files. Or if the blob doesn't contain metadata to differentiate between child class loader classes and the rest. However it seems like the TaskManager always tries to access the specified class paths irrespective of Network Partitions.