(DEPRECATED) Apache Flink User Mailing List archive.

writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Classic

List

Threaded

10 messages Options

Mihail Vieru

writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1)
when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or
Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Till Rohrmann

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Mihail Vieru

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Mihail Vieru

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Maximilian Michels

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Mihail Vieru

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.

On 30.06.2015 16:51, Maximilian Michels wrote:

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

APSPNaiveJob.java (10K) Download Attachment

APSP.java (5K) Download Attachment

APSPData.java (2K) Download Attachment

vertices-test-100 (179K) Download Attachment

edges-test-100 (2K) Download Attachment

Maximilian Michels

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:

Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.

On 30.06.2015 16:51, Maximilian Michels wrote:

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Maximilian Michels

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,

Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:

Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:

Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.

On 30.06.2015 16:51, Maximilian Michels wrote:

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Mihail Vieru

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

I've implemented the alternating 2 files solution and everything works now.

Thanks a lot! You saved my day :)

Cheers,
Mihail

On 02.07.2015 12:37, Maximilian Michels wrote:

The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,

Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:

Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:

Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.

On 30.06.2015 16:51, Maximilian Michels wrote:

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail

Maximilian Michels

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

You're welcome. I'm glad I could help out :)

Cheers,

Max

On Thu, Jul 2, 2015 at 9:17 PM, Mihail Vieru <[hidden email]> wrote:

I've implemented the alternating 2 files solution and everything works now.

Thanks a lot! You saved my day :)

Cheers,
Mihail

On 02.07.2015 12:37, Maximilian Michels wrote:

The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, there will be nothing to read from. Thus you get an empty DataSet which you write to HDFS again. Any further loops will then just write nothing.

You can circumvent this problem, by prefixing every output file with a counter that you increment in your loop. Alternatively, if you only want to keep the latest output, you can use two files and let them alternate to be input and output.

Let me know if you have any further questions.

Kind regards,

Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <[hidden email]> wrote:

Hi Mihail,

Thanks for the code. I'm trying to reproduce the problem now.

On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru <[hidden email]> wrote:

Hi Max,

thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and sample input data.

I run the APSPNaiveJob using the following arguments:

0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9

I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE when the file exists.

But the problem lies with the second call (line 74), trying to write to the same path on HDFS.

This issue is blocking me, because I need to persist the vertices dataset between iterations.

Cheers,
Mihail

P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.

On 30.06.2015 16:51, Maximilian Michels wrote:

HI Mihail,

Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code.

I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change the file names on every write?

Cheers,

Max

On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru <[hidden email]> wrote:

I think my problem is related to a loop in my job.

Before the loop, the writeAsCsv method works fine, even in overwrite mode.

In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements.

Needless to say, this doesn't occur in a local execution environment, when writing to the local file system.

I would appreciate any input on this.

Best,
Mihail

On 30.06.2015 12:10, Mihail Vieru wrote:

Hi Till,

thank you for your reply.

I have the following code snippet:

intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);

When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements.

Cheers,
Mihail

On 30.06.2015 12:06, Till Rohrmann wrote:

Hi Mihail,

have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there.

Cheers,
Till

On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru <[hidden email]> wrote:

Hi,

the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE.
A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster.

What could cause this issue? I really really need this feature..

Best,
Mihail