Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Marco Villalobos-2

Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).

If there is a way, I'd appreciate an example.
Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Aljoscha Krettek
Hi Marco,

this is not possible since Flink is designed mostly to read files from a
distributed filesystem, where paths are used to refer to those files. If
you read from files on the classpath you could just use plain old Java
code and won't need a distributed processing system such as Flink.

Best,
Aljoscha

On 16.06.20 06:46, Marco Villalobos wrote:
>
> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
>
> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
>
> If there is a way, I'd appreciate an example.
>

Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Marco Villalobos-2
Okay, it is not supported.  

I thought about this more and I disagree that this would break "distributability".

Currently, the API accepts a String which is a path, whether it be a path to a remote URL or a local file.
However, after the URL is parsed, ultimately what ends up happening is that an InputStream will serve as the abstraction that reads input from some source.

An InputStream can be remote, it can be a local file, it can be a connection to a server, or another client, and that situation, the system remains distributed.

Also, such an enhancement promotes "Interoperability" because now the user can decide the source of that data, rather forcing it to be a URL or physical file path.

I think this feature would make testing and demos more portable. I was writing a demo, and I wanted it to run without command-line arguments, which would have been very handy. I want the user to simply checkout the code and run it without having to supply a command line parameter declaring where the input file resides.

Thank you.

> On Jun 16, 2020, at 4:57 AM, Aljoscha Krettek <[hidden email]> wrote:
>
> Hi Marco,
>
> this is not possible since Flink is designed mostly to read files from a distributed filesystem, where paths are used to refer to those files. If you read from files on the classpath you could just use plain old Java code and won't need a distributed processing system such as Flink.
>
> Best,
> Aljoscha
>
> On 16.06.20 06:46, Marco Villalobos wrote:
>> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
>> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
>> If there is a way, I'd appreciate an example.
>

Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Marco Villalobos-2
In reply to this post by Aljoscha Krettek
While I still think it would be great for Flink to accept an InputStream, and allow the programmer to decide if it is a remote TCP call or local file, for the sake of my demo, I simply
found the file path within Gradle and supplied to the Gradle application run plugin like this:

run {
    args = ["--input-file", file('timeseries.csv')]
}

and that launched my application with minimal configuration.

> On Jun 17, 2020, at 7:11 AM, Aljoscha Krettek <[hidden email]> wrote:
>
> Hi,
>
> for simple demos you can also use env.fromElements() or env.fromCollection() to create a source from some data that you have already available.
>
> Does that help?
>
> Best,
> Aljoscha
>
> On 16.06.20 15:35, Marco Villalobos wrote:
>> Okay, it is not supported.
>> I understand such a feature is not needed in production systems, but it could make testing and demos more portable. I was writing a demo, and I wanted it to run without command-line arguments, which would have been very handy. I want the user to simply checkout the code and run it without having to supply a command line parameter declaring where the input file resides.
>> Thank you.
>>> On Jun 16, 2020, at 4:57 AM, Aljoscha Krettek <[hidden email]> wrote:
>>>
>>> Hi Marco,
>>>
>>> this is not possible since Flink is designed mostly to read files from a distributed filesystem, where paths are used to refer to those files. If you read from files on the classpath you could just use plain old Java code and won't need a distributed processing system such as Flink.
>>>
>>> Best,
>>> Aljoscha
>>>
>>> On 16.06.20 06:46, Marco Villalobos wrote:
>>>> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
>>>> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
>>>> If there is a way, I'd appreciate an example.
>>>
>