Re: Usage of Hadoop 2.2.0

Posted by Stephan Ewen on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Usage-of-Hadoop-2-2-0-tp2579p2689.html

I am good with that as well. Mind that we are not only dropping a binary distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.



Lets also reconfigure Travis to test

 - Hadoop1
 - Hadoop 2.3
 - Hadoop 2.4
 - Hadoop 2.6
 - Hadoop 2.7


On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>
> +1 to what Robert said.
>
> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0  (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.
>
> The easiest approach  to resolve this issue would be  (a) dropping the support for Hadoop 2.2.0
> An alternative approach (b) would be:
>  - ship a binary version for Hadoop 2.3.0
>  - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
>
> Cheers,
> Till
>