(DEPRECATED) Apache Flink User Mailing List archive.

K8s and flink1.7.1

Classic

List

Threaded

4 messages Options

Vishal Santoshi

K8s and flink1.7.1

There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting. The setting does work in the non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved.

Thanks.

Vishal

Nagarjun Guraja

Re: K8s and flink1.7.1

For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts.

- Winston Churchill -

On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:

There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting. The setting does work in the non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved.

Thanks.

Vishal

Vishal Santoshi

Re: K8s and flink1.7.1

For 1. Thank you for pointing out that property. I surely overlooked it.

For 2. Will try out the other options. It seems the suggestion that best suits us ( we do not want to over engineer on the init container side

configure metrics.internal.query-service.port property to some fixed port (e.g. 6666)
modifying the docker entrypoint script to first configure taskmanager.host

I think this is what you seem to refer to as a possible solution ?

The headless service would generally imply a single service for each TM and that is not sustainable..

On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <[hidden email]> wrote:

For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts.
- Winston Churchill -

On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting. The setting does work in the non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved.

Thanks.

Vishal

Vishal Santoshi

Re: K8s and flink1.7.1

And both worked.. I should have said

modifying the docker entrypoint script to first configure taskmanager.host using the status.podIP as an override or in flink-conf.yaml before the process is launched through the entry script.

Thank you all.

On Sat, Jan 26, 2019 at 4:11 PM Vishal Santoshi <[hidden email]> wrote:

For 1. Thank you for pointing out that property. I surely overlooked it.
For 2. Will try out the other options. It seems the suggestion that best suits us ( we do not want to over engineer on the init container side
configure metrics.internal.query-service.port property to some fixed port (e.g. 6666)
modifying the docker entrypoint script to first configure taskmanager.host

I think this is what you seem to refer to as a possible solution ?

The headless service would generally imply a single service for each TM and that is not sustainable..

On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <[hidden email]> wrote:
For 1. you need to setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well. That should do the trick.

For 2. I am not sure of the timelines, but there are a few decent/not hacky workarounds to get around the problem, mentioned in the comments. Feel free to pick one to unblock yourselves.

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts.
- Winston Churchill -

On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <[hidden email]> wrote:
There are we issues with 1.7.1 "job as a cluster" set up that I need guidance on

1. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port setting. The setting does work in the non HA mode ( The containerPort /TCP with the same port facilitates that ), but then we loose the job if the JM was to reboot. This is a high priority for us and I am sure there is a work around but I rather ask the experts.

2. The metrics on JM are not visible possibly due to https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue and both a service per TM and stateful set approach appear non production ready (not scalable and kludgey ). Do you have a time line when these will be resolved.

Thanks.

Vishal