K8s and flink1.7.1

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

K8s and flink1.7.1

Vishal Santoshi
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on 

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting.  The setting does work in  the  non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the  job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved. 

Thanks.

Vishal
Reply | Threaded
Open this post in threaded view
|

Re: K8s and flink1.7.1

Nagarjun Guraja
For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts. 
- Winston Churchill - 


On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on 

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting.  The setting does work in  the  non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the  job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved. 

Thanks.

Vishal
Reply | Threaded
Open this post in threaded view
|

Re: K8s and flink1.7.1

Vishal Santoshi
For 1. Thank you for pointing out that property. I surely overlooked it.
For 2. Will try out the other options.  It seems the suggestion that best suits us ( we do not want to over engineer on the init container side 
  • configure metrics.internal.query-service.port property to some fixed port (e.g. 6666)
  • modifying the docker entrypoint script to first configure taskmanager.host 

I think this is what you seem to refer to as a possible solution ?

The headless service would generally imply a single service for each TM and that is not sustainable..





On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <[hidden email]> wrote:
For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts. 
- Winston Churchill - 


On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on 

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting.  The setting does work in  the  non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the  job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved. 

Thanks.

Vishal
Reply | Threaded
Open this post in threaded view
|

Re: K8s and flink1.7.1

Vishal Santoshi
And both worked.. I should have said 
  • modifying the docker entrypoint script to first configure taskmanager.host  using the status.podIP as an override or in flink-conf.yaml before the process is launched through the entry script.

Thank you all.

On Sat, Jan 26, 2019 at 4:11 PM Vishal Santoshi <[hidden email]> wrote:
For 1. Thank you for pointing out that property. I surely overlooked it.
For 2. Will try out the other options.  It seems the suggestion that best suits us ( we do not want to over engineer on the init container side 
  • configure metrics.internal.query-service.port property to some fixed port (e.g. 6666)
  • modifying the docker entrypoint script to first configure taskmanager.host 

I think this is what you seem to refer to as a possible solution ?

The headless service would generally imply a single service for each TM and that is not sustainable..





On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <[hidden email]> wrote:
For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts. 
- Winston Churchill - 


On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on 

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting.  The setting does work in  the  non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the  job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved. 

Thanks.

Vishal