Flink 1.0.0 reading files from multiple directory with wildcards

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.0.0 reading files from multiple directory with wildcards

Sourigna Phetsarath
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Fabian Hueske-2
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Ufuk Celebi

On Mon, Mar 21, 2016 at 11:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna




Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Sourigna Phetsarath
Thanks Ufuk, I'm already using the recursive traversal feature.

On Mon, Mar 21, 2016 at 8:39 AM, Ufuk Celebi <[hidden email]> wrote:

On Mon, Mar 21, 2016 at 11:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna







--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Sourigna Phetsarath
In reply to this post by Fabian Hueske-2
Fabian,

I'll try extending InputFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna






--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Sourigna Phetsarath
Ufek & Fabian,

FYI,  I was about to extend the FileInputFormat and extend the createInputSplits to handle multiple Path - there was an improvement of reduced resource usage and increased performance of the job.


On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath <[hidden email]> wrote:
Fabian,

I'll try extending InputFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna






--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna





--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna


Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Fabian Hueske-2
Hi Gna,

thanks for sharing the good news and opening the JIRA!

Cheers, Fabian

2016-03-22 23:30 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
Ufek & Fabian,

FYI,  I was about to extend the FileInputFormat and extend the createInputSplits to handle multiple Path - there was an improvement of reduced resource usage and increased performance of the job.


On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath <[hidden email]> wrote:
Fabian,

I'll try extending InputFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" target="_blank" value="+12124024871">212.402.4871 // m: <a href="tel:917.373.7363" target="_blank" value="+19173737363">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna






--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" target="_blank" value="+12124024871">212.402.4871 // m: <a href="tel:917.373.7363" target="_blank" value="+19173737363">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna





--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" target="_blank" value="+12124024871">212.402.4871 // m: <a href="tel:917.373.7363" target="_blank" value="+19173737363">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Ufuk Celebi
Nice! Would you like to contribute this to Flink via a pull request? Some resources about the contribution process can be found here:


On Wed, Mar 23, 2016 at 12:00 AM, Fabian Hueske <[hidden email]> wrote:
Hi Gna,

thanks for sharing the good news and opening the JIRA!

Cheers, Fabian

2016-03-22 23:30 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
Ufek & Fabian,

FYI,  I was about to extend the FileInputFormat and extend the createInputSplits to handle multiple Path - there was an improvement of reduced resource usage and increased performance of the job.


On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath <[hidden email]> wrote:
Fabian,

I'll try extending InputFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna






--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna





--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna




Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.0.0 reading files from multiple directory with wildcards

Sourigna Phetsarath
Great!  I will, once I clear it with the legal team here.

On Wed, Mar 23, 2016 at 6:19 AM, Ufuk Celebi <[hidden email]> wrote:
Nice! Would you like to contribute this to Flink via a pull request? Some resources about the contribution process can be found here:


On Wed, Mar 23, 2016 at 12:00 AM, Fabian Hueske <[hidden email]> wrote:
Hi Gna,

thanks for sharing the good news and opening the JIRA!

Cheers, Fabian

2016-03-22 23:30 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
Ufek & Fabian,

FYI,  I was about to extend the FileInputFormat and extend the createInputSplits to handle multiple Path - there was an improvement of reduced resource usage and increased performance of the job.


On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath <[hidden email]> wrote:
Fabian,

I'll try extending InputFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat.
Would you mind opening a JIRA issue with your suggestions?

Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding the createInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath <[hidden email]>:
All,

Do any of the Flink Data Sources support comma separated directories with wildcards?

For example:
env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advance for any help that you can provide.
--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna






--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna





--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: <a href="tel:212.402.4871" value="+12124024871" target="_blank">212.402.4871 // m: <a href="tel:917.373.7363" value="+19173737363" target="_blank">917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna







--

Gna Phetsarath
System Architect // AOL Platforms // Data Services // Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 
aim: sphetsarath20 t: @sourigna