Hi guys,
I have a nice question about Hadoop compatibility. In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you say that you can reuse existing mapreduce programs. Could it be possible to manage also complex mapreduce programs like HBase BulkImport that use for example a custom partioner (org.apache.hadoop.mapreduce.Partitioner)? The full code of it can be seen at https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java. Do you think there's any change to make it run in flink? Best, Flavio |
We had an effort to execute any HadoopMR program by simply specifying the JobConf and execute it (even embedded in regular Flink programs). We got quite far but did not complete (counters and custom grouping / sorting functions for Combiners are missing if I remember correctly). I don't think that anybody is working on that right now, but it would definitely be a cool feature. 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <[hidden email]>:
|
I think I could also take care of it if somebody can help me and guide me a little bit..
How long do you think it will require to complete such a task? On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <[hidden email]> wrote:
|
Hmm, that's a tricky question ;-) I would need to have a closer look. But getting custom comparators for sorting and grouping into the Combiner is not that trivial because it touches API, Optimizer, and Runtime code. However, I did that before for the Reducer and with the recent addition of groupCombine the Reducer changes might be just applied to combine. I'll be gone next week, but if you want to, we can have a closer look at the problem after that.2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <[hidden email]>:
|
Great! That will be awesome.
Thank you Fabian On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <[hidden email]> wrote:
|
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it... On 10 Apr 2015 12:26, "Flavio Pompermaier" <[hidden email]> wrote:
|
No, I'm not aware of anybody working on extending the Hadoop compatibility support. I'll also have no time to work on this any time soon :-( 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Do you think is that complex to support it? I think we can try to implement it if someone could give us some support (at least some big picture)
On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <[hidden email]> wrote:
Flavio Pompermaier Development Department OKKAM S.r.l. Tel. <a href="tel:+39%200461%20041809" value="+390461041809" target="_blank">+(39) 0461 041809 |
Looking at my previous mail which mentions changes to API, optimizer, and runtime code of the DataSet API this would be a major and non-trivial effort and also require that a committer spends a good amount of time for this. 2018-01-16 10:07 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Free forum by Nabble | Edit this page |