writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1)
when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or
Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail
Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Till Rohrmann

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail


Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail


On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail



Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Maximilian Michels
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail




Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru
Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.


On 30.06.2015 16:51, Maximilian Michels wrote:
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail






APSPNaiveJob.java (10K) Download Attachment
APSP.java (5K) Download Attachment
APSPData.java (2K) Download Attachment
vertices-test-100 (179K) Download Attachment
edges-test-100 (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Maximilian Michels
Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:
Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.



On 30.06.2015 16:51, Maximilian Michels wrote:
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail






Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Maximilian Michels
The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,
Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:
Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:
Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.



On 30.06.2015 16:51, Maximilian Michels wrote:
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail







Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru
I've implemented the alternating 2 files solution and everything works now.

Thanks a lot! You saved my day :)

Cheers,
Mihail

On 02.07.2015 12:37, Maximilian Michels wrote:
The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,
Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:
Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:
Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.



On 30.06.2015 16:51, Maximilian Michels wrote:
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail








Reply | Threaded
Open this post in threaded view
|

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Maximilian Michels
You're welcome. I'm glad I could help out :)

Cheers,
Max

On Thu, Jul 2, 2015 at 9:17 PM, Mihail Vieru <[hidden email]> wrote:
I've implemented the alternating 2 files solution and everything works now.

Thanks a lot! You saved my day :)

Cheers,
Mihail


On 02.07.2015 12:37, Maximilian Michels wrote:
The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,
Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:
Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:
Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.



On 30.06.2015 16:51, Maximilian Michels wrote:
HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,
Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:
I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.


I would appreciate any input on this.

Best,
Mihail



On 30.06.2015 12:10, Mihail Vieru wrote:
Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail


On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till


On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail