Hi to all,
one of our customers asked us to see a percentage of completion of a Flink Batch job. Is there any already implemented heuristic I can use to compute it? Will this be possible also when DataSet api will migrate to DataStream..? Thanks in advance, Flavio |
Hi Flavio, I'm not aware of such a heuristic being implemented anywhere. You need to come up with something yourself. On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <[hidden email]> wrote:
|
Hi Flavio, This is a daunting task to implement properly. There is an easy fix in related workflow systems though. Assuming that it's a rerunning task, then you simply store the run times of the last run, use some kind of low-pass filter (=decaying average) and compare the current runtime with the expected runtime. Even if Flink would have some estimation, it's probably not more accurate than this. Best, Arvid On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger <[hidden email]> wrote:
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
What do you thinkin about this very rough heuristic (obviously it makes sense only for batch jobs)? It's far from perfect but at least it gives an idea of something going on.. PS: I found some mismatch from the states documented in [1] and the ones I found in the ExecutionState enum.. [1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid Map<ExecutionState, Integer> statusCount = jobDetails.getJobVerticesPerState(); int uncompleted = statusCount.getOrDefault(ExecutionState.CREATED, 0) + // statusCount.getOrDefault(ExecutionState.RUNNING, 0) + /// statusCount.getOrDefault(ExecutionState.CANCELING, 0) + // statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + // // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not found in Flink 1.11.0 // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// not found in Flink 1.11.0 statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + // // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + /// not found in Flink 1.11.0 statusCount.getOrDefault(ExecutionState.RUNNING, 0) + // statusCount.getOrDefault(ExecutionState.SCHEDULED, 0); int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 0) + // statusCount.getOrDefault(ExecutionState.FAILED, 0) + // statusCount.getOrDefault(ExecutionState.CANCELED, 0); final Integer completionPercentage = Math.floorDiv(completed, completed + uncompleted); Thanks in advance, Flavio On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <[hidden email]> wrote:
|
The "mismatch" is due to you mixing job and vertex states.
On 11/5/2020 11:16 AM, Flavio
Pompermaier wrote:
|
Admittedly, it can be out-of-sync if
someone forgets to regenerate the documentation, but they cannot
be mixed up.
On 11/5/2020 11:31 AM, Chesnay Schepler
wrote:
|
Ok I understood. Unfortunately the documentation is not able to extract the Map type of status-count that is Map<ExecutionState, Integer> and I thought that the job status and execution status were equivalent. And what about the heuristic...? Could it make sense On Thu, Nov 5, 2020 at 11:33 AM Chesnay Schepler <[hidden email]> wrote:
|
In reply to this post by Chesnay Schepler
Just another question: should I open a JIRA to rename ExecutionState.CANCELING to CANCELLING (indeed the enum's Javadoc report CANCELLING)? On Thu, Nov 5, 2020 at 11:31 AM Chesnay Schepler <[hidden email]> wrote:
|
No, because that would break the API
and any log-parsing infrastructure relying on it.
On 11/5/2020 2:56 PM, Flavio
Pompermaier wrote:
|
Free forum by Nabble | Edit this page |