Generate _SUCCESS (map-reduce style) when folder has been written

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Generate _SUCCESS (map-reduce style) when folder has been written

Gwenhael Pasquiers

Hi,

 

Sorry if it’s already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it’s been writing into (using the write method with OutputFormat) ?

 

We are replacing a spark job that was generating those files (and further operations rely on it).

 

Best regards,

 

Gwenhaël PASQUIERS

Reply | Threaded
Open this post in threaded view
|

Re: Generate _SUCCESS (map-reduce style) when folder has been written

Fabian Hueske-2
Hi Gwenhael,

The _SUCCESS files were originally generated by Hadoop for successful jobs. AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have followed this approach as well to be compatible.

You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop OutputFormats (both mapred and mapreduce APIs).
The wrapper does also produce the _SUCCESS files. In fact, you might be able to use exactly the same OutputFormat as your Spark job.

Best,
Fabian

2016-12-20 14:00 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Hi,

 

Sorry if it’s already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it’s been writing into (using the write method with OutputFormat) ?

 

We are replacing a spark job that was generating those files (and further operations rely on it).

 

Best regards,

 

Gwenhaël PASQUIERS


Reply | Threaded
Open this post in threaded view
|

RE: Generate _SUCCESS (map-reduce style) when folder has been written

Gwenhael Pasquiers

Thanks, it is working properly now.

NB : Had to delete the folder by code because Hadoop’s OuputFormats will only overwrite file by file, not the whole folder.

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: mardi 20 décembre 2016 14:21
To: [hidden email]
Subject: Re: Generate _SUCCESS (map-reduce style) when folder has been written

 

Hi Gwenhael,

The _SUCCESS files were originally generated by Hadoop for successful jobs. AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have followed this approach as well to be compatible.

You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop OutputFormats (both mapred and mapreduce APIs).
The wrapper does also produce the _SUCCESS files. In fact, you might be able to use exactly the same OutputFormat as your Spark job.

Best,

Fabian

 

2016-12-20 14:00 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Hi,

 

Sorry if it’s already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it’s been writing into (using the write method with OutputFormat) ?

 

We are replacing a spark job that was generating those files (and further operations rely on it).

 

Best regards,

 

Gwenhaël PASQUIERS

 

Reply | Threaded
Open this post in threaded view
|

Re: Generate _SUCCESS (map-reduce style) when folder has been written

Fabian Hueske-2
Great to hear!

Do you mean that the behavior of Flink's HadoopOutputFormat is not consistent with Hadoop's behavior?
If that's the case, could you open a JIRA ticket to report this and maybe also contribute your changes back?

Thanks a lot,
Fabian

2016-12-20 16:37 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Thanks, it is working properly now.

NB : Had to delete the folder by code because Hadoop’s OuputFormats will only overwrite file by file, not the whole folder.

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: mardi 20 décembre 2016 14:21
To: [hidden email]
Subject: Re: Generate _SUCCESS (map-reduce style) when folder has been written

 

Hi Gwenhael,

The _SUCCESS files were originally generated by Hadoop for successful jobs. AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have followed this approach as well to be compatible.

You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop OutputFormats (both mapred and mapreduce APIs).
The wrapper does also produce the _SUCCESS files. In fact, you might be able to use exactly the same OutputFormat as your Spark job.

Best,

Fabian

 

2016-12-20 14:00 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Hi,

 

Sorry if it’s already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it’s been writing into (using the write method with OutputFormat) ?

 

We are replacing a spark job that was generating those files (and further operations rely on it).

 

Best regards,

 

Gwenhaël PASQUIERS

 


Reply | Threaded
Open this post in threaded view
|

RE: Generate _SUCCESS (map-reduce style) when folder has been written

Gwenhael Pasquiers

No, don’t worry, I think it’s totally compliant with Hadoop’s behavior but I wanted it to behave more like Flink (to totally clean the destination folder before outputing new files).

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: mardi 20 décembre 2016 16:41
To: [hidden email]
Subject: Re: Generate _SUCCESS (map-reduce style) when folder has been written

 

Great to hear!

Do you mean that the behavior of Flink's HadoopOutputFormat is not consistent with Hadoop's behavior?

If that's the case, could you open a JIRA ticket to report this and maybe also contribute your changes back?

Thanks a lot,

Fabian

 

2016-12-20 16:37 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Thanks, it is working properly now.

NB : Had to delete the folder by code because Hadoop’s OuputFormats will only overwrite file by file, not the whole folder.

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: mardi 20 décembre 2016 14:21
To: [hidden email]
Subject: Re: Generate _SUCCESS (map-reduce style) when folder has been written

 

Hi Gwenhael,

The _SUCCESS files were originally generated by Hadoop for successful jobs. AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have followed this approach as well to be compatible.

You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop OutputFormats (both mapred and mapreduce APIs).
The wrapper does also produce the _SUCCESS files. In fact, you might be able to use exactly the same OutputFormat as your Spark job.

Best,

Fabian

 

2016-12-20 14:00 GMT+01:00 Gwenhael Pasquiers <[hidden email]>:

Hi,

 

Sorry if it’s already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it’s been writing into (using the write method with OutputFormat) ?

 

We are replacing a spark job that was generating those files (and further operations rely on it).

 

Best regards,

 

Gwenhaël PASQUIERS