While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
Cheers, Till |
I think most cloud providers moved beyond Hadoop 2.2.0. Google's Click-To-Deploy is on 2.4.1 AWS EMR is on 2.6.0 The situation for the distributions seems to be the following: MapR 4 uses Hadoop 2.4.0 (current is MapR 5) CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) HDP 2.0 (October 2013) is using 2.2.0 HDP 2.1 (April 2014) uses 2.4.0 already So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0. Spark does not offer a binary distribution lower than 2.3.0. In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes. The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0 An alternative approach (b) would be: - ship a binary version for Hadoop 2.3.0 - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed. I would vote for approach (a). On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
|
+1 to what Robert said.
On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
|
+1 for dropping Hadoop 2.2.0
Regards, Chiwan Park > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote: > > +1 to what Robert said. > > On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote: > I think most cloud providers moved beyond Hadoop 2.2.0. > Google's Click-To-Deploy is on 2.4.1 > AWS EMR is on 2.6.0 > > The situation for the distributions seems to be the following: > MapR 4 uses Hadoop 2.4.0 (current is MapR 5) > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) > > HDP 2.0 (October 2013) is using 2.2.0 > HDP 2.1 (April 2014) uses 2.4.0 already > > So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0. > > Spark does not offer a binary distribution lower than 2.3.0. > > In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes. > > The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0 > An alternative approach (b) would be: > - ship a binary version for Hadoop 2.3.0 > - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed. > > I would vote for approach (a). > > > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote: > While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone? > > Cheers, > Till > |
I am good with that as well. Mind that we are not only dropping a binary distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0. Lets also reconfigure Travis to test - Hadoop1 - Hadoop 2.3 - Hadoop 2.4 - Hadoop 2.6 - Hadoop 2.7 On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote: +1 for dropping Hadoop 2.2.0 |
+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability changes in Flink. On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote: > I am good with that as well. Mind that we are not only dropping a binary > distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0. > > > > Lets also reconfigure Travis to test > > - Hadoop1 > - Hadoop 2.3 > - Hadoop 2.4 > - Hadoop 2.6 > - Hadoop 2.7 > > > On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote: >> >> +1 for dropping Hadoop 2.2.0 >> >> Regards, >> Chiwan Park >> >> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote: >> > >> > +1 to what Robert said. >> > >> > On Thursday, September 3, 2015, Robert Metzger <[hidden email]> >> > wrote: >> > I think most cloud providers moved beyond Hadoop 2.2.0. >> > Google's Click-To-Deploy is on 2.4.1 >> > AWS EMR is on 2.6.0 >> > >> > The situation for the distributions seems to be the following: >> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5) >> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) >> > >> > HDP 2.0 (October 2013) is using 2.2.0 >> > HDP 2.1 (April 2014) uses 2.4.0 already >> > >> > So both vendors and cloud providers are multiple releases away from >> > Hadoop 2.2.0. >> > >> > Spark does not offer a binary distribution lower than 2.3.0. >> > >> > In addition to that, I don't think that the HDFS client in 2.2.0 is >> > really usable in production environments. Users were reporting >> > ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions >> > sometimes. >> > >> > The easiest approach to resolve this issue would be (a) dropping the >> > support for Hadoop 2.2.0 >> > An alternative approach (b) would be: >> > - ship a binary version for Hadoop 2.3.0 >> > - make the source of Flink still compatible with 2.2.0, so that users >> > can compile a Hadoop 2.2.0 version if needed. >> > >> > I would vote for approach (a). >> > >> > >> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> >> > wrote: >> > While working on high availability (HA) for Flink's YARN execution I >> > stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to >> > 2.3.0, Hadoop introduced new functionality which is required for an >> > efficient HA implementation. Therefore, I was wondering whether there is >> > actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used >> > by someone? >> > >> > Cheers, >> > Till >> > >> >> >> >> >> > |
+1 for dropping
On 09/04/2015 11:04 AM, Maximilian Michels wrote: > +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The > release is hardly used and complicates the important high-availability > changes in Flink. > > On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote: >> I am good with that as well. Mind that we are not only dropping a binary >> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0. >> >> >> >> Lets also reconfigure Travis to test >> >> - Hadoop1 >> - Hadoop 2.3 >> - Hadoop 2.4 >> - Hadoop 2.6 >> - Hadoop 2.7 >> >> >> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote: >>> >>> +1 for dropping Hadoop 2.2.0 >>> >>> Regards, >>> Chiwan Park >>> >>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote: >>>> >>>> +1 to what Robert said. >>>> >>>> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> >>>> wrote: >>>> I think most cloud providers moved beyond Hadoop 2.2.0. >>>> Google's Click-To-Deploy is on 2.4.1 >>>> AWS EMR is on 2.6.0 >>>> >>>> The situation for the distributions seems to be the following: >>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5) >>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) >>>> >>>> HDP 2.0 (October 2013) is using 2.2.0 >>>> HDP 2.1 (April 2014) uses 2.4.0 already >>>> >>>> So both vendors and cloud providers are multiple releases away from >>>> Hadoop 2.2.0. >>>> >>>> Spark does not offer a binary distribution lower than 2.3.0. >>>> >>>> In addition to that, I don't think that the HDFS client in 2.2.0 is >>>> really usable in production environments. Users were reporting >>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions >>>> sometimes. >>>> >>>> The easiest approach to resolve this issue would be (a) dropping the >>>> support for Hadoop 2.2.0 >>>> An alternative approach (b) would be: >>>> - ship a binary version for Hadoop 2.3.0 >>>> - make the source of Flink still compatible with 2.2.0, so that users >>>> can compile a Hadoop 2.2.0 version if needed. >>>> >>>> I would vote for approach (a). >>>> >>>> >>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> >>>> wrote: >>>> While working on high availability (HA) for Flink's YARN execution I >>>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to >>>> 2.3.0, Hadoop introduced new functionality which is required for an >>>> efficient HA implementation. Therefore, I was wondering whether there is >>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used >>>> by someone? >>>> >>>> Cheers, >>>> Till >>>> >>> >>> >>> >>> >>> >> signature.asc (836 bytes) Download Attachment |
I created a Jira for this: https://issues.apache.org/jira/browse/FLINK-2643 On Fri, 4 Sep 2015 at 13:01 Matthias J. Sax <[hidden email]> wrote: +1 for dropping |
Free forum by Nabble | Edit this page |