Dataset output callback

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataset output callback

Flavio Pompermaier
Hi to all,

In my program I'd like to infer from a mysql table the list of directory I have to output on hdfs (status=0).
Once my job finish to clean each directory and update the status value of my sql table.
How can I do that in Flink? Is there any callback on the dataset.output() finish?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Dataset output callback

Flavio Pompermaier
Any insight about this?

On Tue, May 26, 2015 at 5:58 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

In my program I'd like to infer from a mysql table the list of directory I have to output on hdfs (status=0).
Once my job finish to clean each directory and update the status value of my sql table.
How can I do that in Flink? Is there any callback on the dataset.output() finish?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Dataset output callback

Stephan Ewen

If you just want to do something after the job finishes, just put the finalization code after the call to "execute()":

env.execute();
myCustomOutputFinalization();

If you want each parallel output to do something after it finished, then override the "close()" method of the OutputFormat.

To have a finalization call on the JobManager, make sure the OutputFormat implements "FinalizeOnMaster" and override that interface's method.

Does any of that work for you?

Am 27.05.2015 08:43 schrieb "Flavio Pompermaier" <[hidden email]>:
Any insight about this?

On Tue, May 26, 2015 at 5:58 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

In my program I'd like to infer from a mysql table the list of directory I have to output on hdfs (status=0).
Once my job finish to clean each directory and update the status value of my sql table.
How can I do that in Flink? Is there any callback on the dataset.output() finish?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Dataset output callback

Flavio Pompermaier
I'll look into that! Thanks for the moment

On Wed, May 27, 2015 at 9:24 AM, Stephan Ewen <[hidden email]> wrote:

If you just want to do something after the job finishes, just put the finalization code after the call to "execute()":

env.execute();
myCustomOutputFinalization();

If you want each parallel output to do something after it finished, then override the "close()" method of the OutputFormat.

To have a finalization call on the JobManager, make sure the OutputFormat implements "FinalizeOnMaster" and override that interface's method.

Does any of that work for you?

Am 27.05.2015 08:43 schrieb "Flavio Pompermaier" <[hidden email]>:
Any insight about this?

On Tue, May 26, 2015 at 5:58 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

In my program I'd like to infer from a mysql table the list of directory I have to output on hdfs (status=0).
Once my job finish to clean each directory and update the status value of my sql table.
How can I do that in Flink? Is there any callback on the dataset.output() finish?

Best,
Flavio