Run command after Batch is finished

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Run command after Batch is finished

Mark Davis
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark
Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Jeff Zhang

Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Mark Davis
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang

Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Jeff Zhang
It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Mark Davis
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang

Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Andrey Zagrebin-5
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang

Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Flavio Pompermaier
I usually run some code after env.execute(), it's not elegant but it works (only if you run the code from Cli client, not from REST one)

On Mon, Jun 8, 2020 at 5:25 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang


Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Chesnay Schepler
In reply to this post by Andrey Zagrebin-5
This goes in the right direction; have a look at org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can implement to run something on the Master after all subtasks have been closed.

On 08/06/2020 17:25, Andrey Zagrebin wrote:
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang


Reply | Threaded
Open this post in threaded view
|

Re: Run command after Batch is finished

Mark Davis
Hi Chesnay,

That is an interesting proposal, thank you!
I was doing something similar with the OutputFormat#close() method respecting the Format's parallelism. Using FinalizeOnMaster will make things easier.
But the problem is that several OutputFormats must be synchronized externally - every output must check whether other outputs finished already... Quite cumbersome.
Also there is a problem with exceptions - the OutputFormats can be never open and never closed.

  Mark


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, June 8, 2020 5:50 PM, Chesnay Schepler <[hidden email]> wrote:

This goes in the right direction; have a look at org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can implement to run something on the Master after all subtasks have been closed.

On 08/06/2020 17:25, Andrey Zagrebin wrote:
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created. 

Mark Davis <[hidden email]> 于2020年6月6日周六 下午8:14写道:
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:


Mark Davis <[hidden email]> 于2020年6月6日周六 上午12:00写道:
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

  Mark


--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang