Streaming file source?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming file source?

Niels Basjes
Hi,

For testing and optimizing a streaming application I want to have a "100% accurate repeatable" substitute for a Kafka source.
I was thinking of creating a streaming source class that simply reads the records from a (static unchanging) set of files.
Each file would then produce the data which (in the live situation) come from a single Kafka partition.

I hate reinventing the wheel so I'm wondering is something like this already been built by someone?
If so, where can I find it?

--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: Streaming file source?

Stephan Ewen
Hi Niels!

There is the Continuous File Monitoring Source, used via

StreamExecutionEnvironment.readFile(FileInputFormat<OUT> inputFormat, String filePath, FileProcessingMode watchType, long interval);

This can be used to both continuously ingest from files, or to read files once.

Kostas can probably comment more about whether and how you can make the file order deterministic.

Stephan


On Fri, Jan 20, 2017 at 11:20 AM, Niels Basjes <[hidden email]> wrote:
Hi,

For testing and optimizing a streaming application I want to have a "100% accurate repeatable" substitute for a Kafka source.
I was thinking of creating a streaming source class that simply reads the records from a (static unchanging) set of files.
Each file would then produce the data which (in the live situation) come from a single Kafka partition.

I hate reinventing the wheel so I'm wondering is something like this already been built by someone?
If so, where can I find it?

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply | Threaded
Open this post in threaded view
|

Re: Streaming file source?

Niels Basjes
Thanks!

This sounds really close to what I had in mind. 
I'll use this first and see how far I get.

Niels

On Fri, Jan 20, 2017 at 11:27 AM, Stephan Ewen <[hidden email]> wrote:
Hi Niels!

There is the Continuous File Monitoring Source, used via

StreamExecutionEnvironment.readFile(FileInputFormat<OUT> inputFormat, String filePath, FileProcessingMode watchType, long interval);

This can be used to both continuously ingest from files, or to read files once.

Kostas can probably comment more about whether and how you can make the file order deterministic.

Stephan


On Fri, Jan 20, 2017 at 11:20 AM, Niels Basjes <[hidden email]> wrote:
Hi,

For testing and optimizing a streaming application I want to have a "100% accurate repeatable" substitute for a Kafka source.
I was thinking of creating a streaming source class that simply reads the records from a (static unchanging) set of files.
Each file would then produce the data which (in the live situation) come from a single Kafka partition.

I hate reinventing the wheel so I'm wondering is something like this already been built by someone?
If so, where can I find it?

--
Best regards / Met vriendelijke groeten,

Niels Basjes




--
Best regards / Met vriendelijke groeten,

Niels Basjes