Hadoop compatibility and HBase bulk loading

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Hadoop compatibility and HBase bulk loading

Flavio Pompermaier
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske-2
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske-2
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio




Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it...

On 10 Apr 2015 12:26, "Flavio Pompermaier" <[hidden email]> wrote:
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio





Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske-2
No, I'm not aware of anybody working on extending the Hadoop compatibility support.
I'll also have no time to work on this any time soon :-(

2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it...

On 10 Apr 2015 12:26, "Flavio Pompermaier" <[hidden email]> wrote:
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio






Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier
Do you think is that complex to support it? I think we can try to implement it if someone could give us some support (at least some big picture)

On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <[hidden email]> wrote:
No, I'm not aware of anybody working on extending the Hadoop compatibility support.
I'll also have no time to work on this any time soon :-(

2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it...

On 10 Apr 2015 12:26, "Flavio Pompermaier" <[hidden email]> wrote:
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio









--
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. <a href="tel:+39%200461%20041809" value="+390461041809" target="_blank">+(39) 0461 041809
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske-2
Looking at my previous mail which mentions changes to API, optimizer, and runtime code of the DataSet API this would be a major and non-trivial effort and also require that a committer spends a good amount of time for this.


2018-01-16 10:07 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Do you think is that complex to support it? I think we can try to implement it if someone could give us some support (at least some big picture)

On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <[hidden email]> wrote:
No, I'm not aware of anybody working on extending the Hadoop compatibility support.
I'll also have no time to work on this any time soon :-(

2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it...

On 10 Apr 2015 12:26, "Flavio Pompermaier" <[hidden email]> wrote:
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job parameters (like partitioner, mapper, reducers, etc) -> http://pastebin.com/8VXjYAEf.

Do you think there's any change to make it run in flink?

Best,
Flavio









--
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. <a href="tel:+39%200461%20041809" value="+390461041809" target="_blank">+(39) 0461 041809