(DEPRECATED) Apache Flink User Mailing List archive.

Run command after Batch is finished

Classic

List

Threaded

9 messages Options

Mark Davis

Run command after Batch is finished

Hi there,

I am running a Batch job with several outputs.

Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

Jeff Zhang

Re: Run command after Batch is finished

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：

Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

Best Regards

Jeff Zhang

Mark Davis

Re: Run command after Batch is finished

Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

Jeff Zhang

Re: Run command after Batch is finished

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：

Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

Best Regards

Jeff Zhang

Mark Davis

Re: Run command after Batch is finished

Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

--
Best Regards

Jeff Zhang

Andrey Zagrebin-5

Re: Run command after Batch is finished

Hi Mark,

I do not know how you output the results in your pipeline.

If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.

I also cc Aljoscha, maybe, he has more ideas.

Best,

Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:

Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

--
Best Regards

Jeff Zhang

Flavio Pompermaier

Re: Run command after Batch is finished

I usually run some code after env.execute(), it's not elegant but it works (only if you run the code from Cli client, not from REST one)

On Mon, Jun 8, 2020 at 5:25 PM Andrey Zagrebin <[hidden email]> wrote:

Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

--
Best Regards

Jeff Zhang

Chesnay Schepler

Re: Run command after Batch is finished

In reply to this post by Andrey Zagrebin-5

This goes in the right direction; have a look at org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can implement to run something on the Master after all subtasks have been closed.

On 08/06/2020 17:25, Andrey Zagrebin wrote:

Hi Mark,

I do not know how you output the results in your pipeline.

If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.

I also cc Aljoscha, maybe, he has more ideas.

Best,

Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:

Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：

Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?

Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：

Hi there,

I am running a Batch job with several outputs.

Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--

Best Regards

Jeff Zhang

--

Best Regards

Jeff Zhang

Mark Davis

Re: Run command after Batch is finished

Hi Chesnay,

That is an interesting proposal, thank you!
I was doing something similar with the OutputFormat#close() method respecting the Format's parallelism. Using FinalizeOnMaster will make things easier.
But the problem is that several OutputFormats must be synchronized externally - every output must check whether other outputs finished already... Quite cumbersome.
Also there is a problem with exceptions - the OutputFormats can be never open and never closed.

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, June 8, 2020 5:50 PM, Chesnay Schepler <[hidden email]> wrote:

This goes in the right direction; have a look at org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can implement to run something on the Master after all subtasks have been closed.

On 08/06/2020 17:25, Andrey Zagrebin wrote:
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat<T> outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis <[hidden email]> wrote:
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment without waiting for it to finish. Think of an elastic web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster together with the batch code. Maybe a hook to a job manager or similar. I do not plan to run anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang <[hidden email]> wrote:

It would run in the client side where ExecutionEnvironment is created.

Mark Davis <[hidden email]> 于2020年6月6日周六下午8:14写道：
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

Mark

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang <[hidden email]> wrote:

You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java

Mark Davis <[hidden email]> 于2020年6月6日周六上午12:00写道：
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around ExecutionEnvironment.execute() call, but I have to switch to the detached execution mode - in this mode the finally block is never run.

Thank you!

Mark

--
Best Regards

Jeff Zhang

--
Best Regards

Jeff Zhang