Shipping Filesystem Plugins with YarnClusterDescriptor

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Shipping Filesystem Plugins with YarnClusterDescriptor

John Mathews
Hello,

I have a custom filesystem that I am trying to migrate to the plugins model described here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation, but it is unclear to me how to dynamically get the plugins directory to be available when launching using a Yarn Cluster Descriptor. One thought was to add the plugins to the shipFilesList, but I don't think that would result in the plugins being in the correct directory location for Flink to discover it.

Is there another way to get the plugins onto the host when launching the cluster? Or is there a different recommended way of doing this? Happy to answer any questions if something is unclear.

Thanks so much for your help!

John
Reply | Threaded
Open this post in threaded view
|

Re: Shipping Filesystem Plugins with YarnClusterDescriptor

Yangze Guo
Hi, John,

AFAIK, Flink will automatically help you to ship the "plugins/"
directory of your Flink distribution to Yarn[1]. So, you just need to
make a directory in "plugins/" and put your custom jar into it. Do you
meet any problem with this approach?

[1] https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770

Best,
Yangze Guo

On Wed, Jun 10, 2020 at 11:29 PM John Mathews <[hidden email]> wrote:

>
> Hello,
>
> I have a custom filesystem that I am trying to migrate to the plugins model described here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation, but it is unclear to me how to dynamically get the plugins directory to be available when launching using a Yarn Cluster Descriptor. One thought was to add the plugins to the shipFilesList, but I don't think that would result in the plugins being in the correct directory location for Flink to discover it.
>
> Is there another way to get the plugins onto the host when launching the cluster? Or is there a different recommended way of doing this? Happy to answer any questions if something is unclear.
>
> Thanks so much for your help!
>
> John
Reply | Threaded
Open this post in threaded view
|

Re: Shipping Filesystem Plugins with YarnClusterDescriptor

John Mathews
So I think that will work, but it has some limitations. Namely, when launching clusters through a service (which is our use case), it can be the case that multiple different clients want clusters with different plugins or different versions of a given plugin, but because the FlinkClusterDescriptor currently reads where to get the plugins to ship from an environment variable, there is a race condition where that directory could contain plugins from multiple different in-flight requests to spin up a cluster. 

I think a possible solution is to expose configuration on the YarnClusterDescriptor that is similar to the shipFiles list, but is instead a shipPlugins list, that way, the plugins that get shipping are per yarn cluster request instead of on a global level.

Do you see any workarounds for the issue I described? Also, does the idea I propose make sense as a solution?



On Wed, Jun 10, 2020 at 9:16 PM Yangze Guo <[hidden email]> wrote:
Hi, John,

AFAIK, Flink will automatically help you to ship the "plugins/"
directory of your Flink distribution to Yarn[1]. So, you just need to
make a directory in "plugins/" and put your custom jar into it. Do you
meet any problem with this approach?

[1] https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770

Best,
Yangze Guo

On Wed, Jun 10, 2020 at 11:29 PM John Mathews <[hidden email]> wrote:
>
> Hello,
>
> I have a custom filesystem that I am trying to migrate to the plugins model described here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation, but it is unclear to me how to dynamically get the plugins directory to be available when launching using a Yarn Cluster Descriptor. One thought was to add the plugins to the shipFilesList, but I don't think that would result in the plugins being in the correct directory location for Flink to discover it.
>
> Is there another way to get the plugins onto the host when launching the cluster? Or is there a different recommended way of doing this? Happy to answer any questions if something is unclear.
>
> Thanks so much for your help!
>
> John
Reply | Threaded
Open this post in threaded view
|

Re: Shipping Filesystem Plugins with YarnClusterDescriptor

Kostas Kloudas-2
Hi John,

I think that using different plugins is not going to be an issue,
assuming that the scheme of your FS's do not collide. This is already
the case for S3 within Flink, where we have 2 implementations, one
based on Presto and one based on Hadoop. For the first you can use the
scheme s3p while for the latter s3a.

Now for different versions of the same plugin, this can be an issue in
the case that all of them are present concurrently in your plugins
directory. But is this the case, or only the latest version of a given
plugin is present?

Keep in mind that after uploading, the "remote" plugins dir is not
shared among applications but it is "private" to each one of them.

Cheers,
Kostas

On Thu, Jun 11, 2020 at 5:12 PM John Mathews <[hidden email]> wrote:

>
> So I think that will work, but it has some limitations. Namely, when launching clusters through a service (which is our use case), it can be the case that multiple different clients want clusters with different plugins or different versions of a given plugin, but because the FlinkClusterDescriptor currently reads where to get the plugins to ship from an environment variable, there is a race condition where that directory could contain plugins from multiple different in-flight requests to spin up a cluster.
>
> I think a possible solution is to expose configuration on the YarnClusterDescriptor that is similar to the shipFiles list, but is instead a shipPlugins list, that way, the plugins that get shipping are per yarn cluster request instead of on a global level.
>
> Do you see any workarounds for the issue I described? Also, does the idea I propose make sense as a solution?
>
>
>
> On Wed, Jun 10, 2020 at 9:16 PM Yangze Guo <[hidden email]> wrote:
>>
>> Hi, John,
>>
>> AFAIK, Flink will automatically help you to ship the "plugins/"
>> directory of your Flink distribution to Yarn[1]. So, you just need to
>> make a directory in "plugins/" and put your custom jar into it. Do you
>> meet any problem with this approach?
>>
>> [1] https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770
>>
>> Best,
>> Yangze Guo
>>
>> On Wed, Jun 10, 2020 at 11:29 PM John Mathews <[hidden email]> wrote:
>> >
>> > Hello,
>> >
>> > I have a custom filesystem that I am trying to migrate to the plugins model described here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation, but it is unclear to me how to dynamically get the plugins directory to be available when launching using a Yarn Cluster Descriptor. One thought was to add the plugins to the shipFilesList, but I don't think that would result in the plugins being in the correct directory location for Flink to discover it.
>> >
>> > Is there another way to get the plugins onto the host when launching the cluster? Or is there a different recommended way of doing this? Happy to answer any questions if something is unclear.
>> >
>> > Thanks so much for your help!
>> >
>> > John