DataStream API in Batch Execution mode

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DataStream API in Batch Execution mode

Marco Villalobos-2
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode?

I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I flatten the file structure just to process the right set of files.


Reply | Threaded
Open this post in threaded view
|

Re: DataStream API in Batch Execution mode

Guowei Ma
Hi, Macro

I think you could try the `FileSource` and you could find an example from [1]. The `FileSource` would scan the file under the given directory recursively.
Would you mind opening an issue for lacking the document?

Best,
Guowei


On Tue, Jun 8, 2021 at 5:59 AM Marco Villalobos <[hidden email]> wrote:
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode?

I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I flatten the file structure just to process the right set of files.


Reply | Threaded
Open this post in threaded view
|

Re: DataStream API in Batch Execution mode

Marco Villalobos-2
That worked.  Thank you very much.

On Mon, Jun 7, 2021 at 9:23 PM Guowei Ma <[hidden email]> wrote:
Hi, Macro

I think you could try the `FileSource` and you could find an example from [1]. The `FileSource` would scan the file under the given directory recursively.
Would you mind opening an issue for lacking the document?

Best,
Guowei


On Tue, Jun 8, 2021 at 5:59 AM Marco Villalobos <[hidden email]> wrote:
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode?

I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I flatten the file structure just to process the right set of files.