Retaining uploaded job jars on Flink HA restarts on Kubernetes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Retaining uploaded job jars on Flink HA restarts on Kubernetes

Rohil Surana
Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil
Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

chiggi_dev
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana <[hidden email]> wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil
Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

Rohil Surana
Hey Chirag,

I tried adding both the configs as per the documentation, and I can see the jars getting uploaded to the specified paths, but on JobManager restarts the JARS are actually deleted from the `jobmanager.web.upload.dir` path.
Anything else that I am missing?


Thanks.
- Rohil

On Mon, May 7, 2018 at 11:48 AM, Chirag Dewan <[hidden email]> wrote:
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana <[hidden email]> wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil

Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

Chesnay Schepler
In reply to this post by chiggi_dev
The jar directory is automatically deleted when a JobManager shuts down.

In other words, there is no way to retain uploaded jars if a JobManager dies, and no way to point a JobManager to a pre-existing directory.

On 07.05.2018 08:18, Chirag Dewan wrote:
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana [hidden email] wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil


Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

Rohil Surana
Ok.
but why was this decision taken to automatically delete and not retain the jars, to me it makes sense to have the uploaded jars so user doesn't have to do it when JobManager restarts.

Thanks.
- Rohil

On Mon, May 7, 2018 at 12:16 PM, Chesnay Schepler <[hidden email]> wrote:
The jar directory is automatically deleted when a JobManager shuts down.

In other words, there is no way to retain uploaded jars if a JobManager dies, and no way to point a JobManager to a pre-existing directory.


On 07.05.2018 08:18, Chirag Dewan wrote:
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana [hidden email] wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil



Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

Sampath Bhat
Hi Rohil

You need not upload the jar again when job manager restarts in an HA environment. Only the the jar stored in web.upload.dir will be deleted which is fine. The jars needed for the job manager to restart will be stored in high-availability.storageDir along with job graphs and job related stuff. So when HA is enabled and the job manager restarts for whatsoever reason the job manager looks into high-availability.storageDir location for restarting the previously running jobs.

On Mon, May 7, 2018 at 5:22 PM, Rohil Surana <[hidden email]> wrote:
Ok.
but why was this decision taken to automatically delete and not retain the jars, to me it makes sense to have the uploaded jars so user doesn't have to do it when JobManager restarts.

Thanks.
- Rohil

On Mon, May 7, 2018 at 12:16 PM, Chesnay Schepler <[hidden email]> wrote:
The jar directory is automatically deleted when a JobManager shuts down.

In other words, there is no way to retain uploaded jars if a JobManager dies, and no way to point a JobManager to a pre-existing directory.


On 07.05.2018 08:18, Chirag Dewan wrote:
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana [hidden email] wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil




Reply | Threaded
Open this post in threaded view
|

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

Rohil Surana
Yes, I am aware that to restart the jobs Flink won't require the jars. But would have been awesome if it could have retained those.

Thanks all for the help.

Regards,
- Rohil

On Mon 7 May, 2018, 5:32 PM Sampath Bhat, <[hidden email]> wrote:
Hi Rohil

You need not upload the jar again when job manager restarts in an HA environment. Only the the jar stored in web.upload.dir will be deleted which is fine. The jars needed for the job manager to restart will be stored in high-availability.storageDir along with job graphs and job related stuff. So when HA is enabled and the job manager restarts for whatsoever reason the job manager looks into high-availability.storageDir location for restarting the previously running jobs.

On Mon, May 7, 2018 at 5:22 PM, Rohil Surana <[hidden email]> wrote:
Ok.
but why was this decision taken to automatically delete and not retain the jars, to me it makes sense to have the uploaded jars so user doesn't have to do it when JobManager restarts.

Thanks.
- Rohil

On Mon, May 7, 2018 at 12:16 PM, Chesnay Schepler <[hidden email]> wrote:
The jar directory is automatically deleted when a JobManager shuts down.

In other words, there is no way to retain uploaded jars if a JobManager dies, and no way to point a JobManager to a pre-existing directory.


On 07.05.2018 08:18, Chirag Dewan wrote:
I think you are looking for jobmanager.web.tmpdir along with upload.dir 

From the documentation :

  • jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the directory. Also uploaded job jars are stored in the directory if not overridden. By default, the temporary directory is used.

  • jobmanager.web.upload.dir: The config parameter defining the directory for uploading the job jars. If not specified a dynamic directory will be used under the directory specified by jobmanager.web.tmpdir.


Regards,

Chirag



On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana [hidden email] wrote:


Hi,

I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts.

For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink-uploads` where /data is for the NFS volume.

Still when the JobManager is killed, the uploaded jars are lost.

Would really appreciate if anyone can help in what I am missing.
Here is the link to my flink-conf.yaml - https://pastebin.com/dt7tGTYQ

Thanks.

- Rohil