Issue with writeAsText() to S3 bucket

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with writeAsText() to S3 bucket

Nguyen, Michael

Hello all,

 

I am running into issues at the moment trying to print my DataStreams to an S3 bucket using writeAsText(“s3://bucket/result.json”) in my Flink job. I used print() on the same DataStream and I see the output I am looking for in standard output. I first confirm that my datastream has data by looking at the standard output, then I cancel my Flink job. After cancelling the job, result.json only gets created in my S3 bucket some of the time. It does not always gets created, but I confirmed that I see my data in standard output.

 

I understand writeAsText() should be used for debugging purposes only according to Flink’s documentation, but I’m just curious as to why I can’t get writeAsText() to always work every time I cancel my job.

 

Thank you for your help,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Issue with writeAsText() to S3 bucket

Fabian Hueske-2
Hi Michael,

One reason might be that S3's file listing command is only eventually consistent.
It might take some time until the file appears and is listed.

Best, Fabian

Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael <[hidden email]>:

Hello all,

 

I am running into issues at the moment trying to print my DataStreams to an S3 bucket using writeAsText(“s3://bucket/result.json”) in my Flink job. I used print() on the same DataStream and I see the output I am looking for in standard output. I first confirm that my datastream has data by looking at the standard output, then I cancel my Flink job. After cancelling the job, result.json only gets created in my S3 bucket some of the time. It does not always gets created, but I confirmed that I see my data in standard output.

 

I understand writeAsText() should be used for debugging purposes only according to Flink’s documentation, but I’m just curious as to why I can’t get writeAsText() to always work every time I cancel my job.

 

Thank you for your help,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Issue with writeAsText() to S3 bucket

Nguyen, Michael

Hi Fabian,

 

Thank you for the response. So I am currently using .writeAsText() to print out 9 different datastreams in one Flink job as I am printing my original datastream with various filters applied to it. I usually see around 6-7 of my datastreams successfully list the JSON file in my S3 bucket upon cancelling my Flink job.

 

Even in my situation, would this still be an issue with S3’s file listing command?

 

Thanks,

Michael

 

From: Fabian Hueske <[hidden email]>
Date: Friday, October 25, 2019 at 6:04 AM
To: Michael Nguyen <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Issue with writeAsText() to S3 bucket

 

[External]

 

Hi Michael,

 

One reason might be that S3's file listing command is only eventually consistent.

It might take some time until the file appears and is listed.

 

Best, Fabian

 

Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael <[hidden email]>:

Hello all,

 

I am running into issues at the moment trying to print my DataStreams to an S3 bucket using writeAsText(“s3://bucket/result.json”) in my Flink job. I used print() on the same DataStream and I see the output I am looking for in standard output. I first confirm that my datastream has data by looking at the standard output, then I cancel my Flink job. After cancelling the job, result.json only gets created in my S3 bucket some of the time. It does not always gets created, but I confirmed that I see my data in standard output.

 

I understand writeAsText() should be used for debugging purposes only according to Flink’s documentation, but I’m just curious as to why I can’t get writeAsText() to always work every time I cancel my job.

 

Thank you for your help,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Issue with writeAsText() to S3 bucket

Fabian Hueske-2
Hi Micheal,

I'm not super familiar with S3 but from my understanding, files might not be visible to other services (such as a directory browser) immediately after they've been created.
Did you wait for some time after you cancelled the job before checking for the files?

Best, Fabian

Am Mo., 28. Okt. 2019 um 08:03 Uhr schrieb Nguyen, Michael <[hidden email]>:

Hi Fabian,

 

Thank you for the response. So I am currently using .writeAsText() to print out 9 different datastreams in one Flink job as I am printing my original datastream with various filters applied to it. I usually see around 6-7 of my datastreams successfully list the JSON file in my S3 bucket upon cancelling my Flink job.

 

Even in my situation, would this still be an issue with S3’s file listing command?

 

Thanks,

Michael

 

From: Fabian Hueske <[hidden email]>
Date: Friday, October 25, 2019 at 6:04 AM
To: Michael Nguyen <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Issue with writeAsText() to S3 bucket

 

[External]

 

Hi Michael,

 

One reason might be that S3's file listing command is only eventually consistent.

It might take some time until the file appears and is listed.

 

Best, Fabian

 

Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael <[hidden email]>:

Hello all,

 

I am running into issues at the moment trying to print my DataStreams to an S3 bucket using writeAsText(“s3://bucket/result.json”) in my Flink job. I used print() on the same DataStream and I see the output I am looking for in standard output. I first confirm that my datastream has data by looking at the standard output, then I cancel my Flink job. After cancelling the job, result.json only gets created in my S3 bucket some of the time. It does not always gets created, but I confirmed that I see my data in standard output.

 

I understand writeAsText() should be used for debugging purposes only according to Flink’s documentation, but I’m just curious as to why I can’t get writeAsText() to always work every time I cancel my job.

 

Thank you for your help,

Michael