Hi there, I am running a Batch job with several outputs. Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished? Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run. Thank you! Mark
|
You can try JobListener which you can register to ExecutionEnvironment. Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Best Regards
Jeff Zhang |
Hi Jeff,
Thank you very much! That is exactly what I need. Where the listener code will run in the cluster deployment(YARN, k8s)? Will it be sent over the network? Thank you! Mark ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:
|
It would run in the client side where ExecutionEnvironment is created. Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道: Hi Jeff, Best Regards
Jeff Zhang |
Hi Jeff,
Unfortunately this is not good enough for me. My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda. I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups. Is there something like this? Thank you! Mark ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:
|
Hi Mark, I do not know how you output the results in your pipeline. If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager. I also cc Aljoscha, maybe, he has more ideas. Best, Andrey On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote: Hi Jeff, |
I usually run some code after env.execute(), it's not elegant but it works (only if you run the code from Cli client, not from REST one) On Mon, Jun 8, 2020 at 5:25 PM Andrey Zagrebin <[hidden email]> wrote:
|
In reply to this post by Andrey Zagrebin-5
This goes in the right direction; have
a look at org.apache.flink.api.common.io.FinalizeOnMaster,
which an OutputFormat can implement to run something on the Master
after all subtasks have been closed.
On 08/06/2020 17:25, Andrey Zagrebin
wrote:
|
Hi Chesnay,
That is an interesting proposal, thank you! I was doing something similar with the OutputFormat#close() method respecting the Format's parallelism. Using FinalizeOnMaster will make things easier. But the problem is that several OutputFormats must be synchronized externally - every output must check whether other outputs finished already... Quite cumbersome. Also there is a problem with exceptions - the OutputFormats can be never open and never closed. Mark ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, June 8, 2020 5:50 PM, Chesnay Schepler <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |