Queryable state on task managers that are not running the job

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Queryable state on task managers that are not running the job

Martin Boyanov
Hi,
I'm running a long-running flink job in cluster mode and I'm interested in using the queryable state functionality. 
I have the following problem: when I query the flink task managers (i.e. the queryable state proxy), it is possible to hit a task manager which doesn't have the requested state, because the job is not running on that task manager.
For example, I might have a cluster with 5 task managers, but the job is deployed only on 3 of those. If my query hits any of the two idle task managers, I naturally get an error message that the job does not exist.
My current solution is to size the cluster appropriately so that there are no idle task managers. I was wondering if there was a better solution or if this could be handled better in the future?
Thanks in advance.
Kind regards,
Martin
Reply | Threaded
Open this post in threaded view
|

Re: Queryable state on task managers that are not running the job

Yun Tang
Hi Martin,

What kind of deploy mode you choose? If you use per-job mode [1] to launch jobs, there might exist only idle slots instead of idle taskmanagers. Currently, queryable state is bounded to specific job and if the idle taskmanager is not registered in the target's resource manager, no queryable state could be queried.



Best
Yun Tang

From: Martin Boyanov <[hidden email]>
Sent: Monday, December 21, 2020 19:04
To: [hidden email] <[hidden email]>
Subject: Queryable state on task managers that are not running the job
 
Hi,
I'm running a long-running flink job in cluster mode and I'm interested in using the queryable state functionality. 
I have the following problem: when I query the flink task managers (i.e. the queryable state proxy), it is possible to hit a task manager which doesn't have the requested state, because the job is not running on that task manager.
For example, I might have a cluster with 5 task managers, but the job is deployed only on 3 of those. If my query hits any of the two idle task managers, I naturally get an error message that the job does not exist.
My current solution is to size the cluster appropriately so that there are no idle task managers. I was wondering if there was a better solution or if this could be handled better in the future?
Thanks in advance.
Kind regards,
Martin