Usage of Hadoop 2.2.0

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Usage of Hadoop 2.2.0

Till Rohrmann
While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till
Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

rmetzger0
I think most cloud providers moved beyond Hadoop 2.2.0.
Google's Click-To-Deploy is on 2.4.1
AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:
MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0  (October 2013) is using 2.2.0
HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.

The easiest approach  to resolve this issue would be  (a) dropping the support for Hadoop 2.2.0
An alternative approach (b) would be:
 - ship a binary version for Hadoop 2.3.0
 - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).


On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till

Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Ufuk Celebi
+1 to what Robert said.

On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
I think most cloud providers moved beyond Hadoop 2.2.0.
Google's Click-To-Deploy is on 2.4.1
AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:
MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0  (October 2013) is using 2.2.0
HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.

The easiest approach  to resolve this issue would be  (a) dropping the support for Hadoop 2.2.0
An alternative approach (b) would be:
 - ship a binary version for Hadoop 2.3.0
 - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).


On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;trohrmann@apache.org&#39;);" target="_blank">trohrmann@...> wrote:
While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till

Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Chiwan Park-2
+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>
> +1 to what Robert said.
>
> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0  (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.
>
> The easiest approach  to resolve this issue would be  (a) dropping the support for Hadoop 2.2.0
> An alternative approach (b) would be:
>  - ship a binary version for Hadoop 2.3.0
>  - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
>
> Cheers,
> Till
>





Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Stephan Ewen
I am good with that as well. Mind that we are not only dropping a binary distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.



Lets also reconfigure Travis to test

 - Hadoop1
 - Hadoop 2.3
 - Hadoop 2.4
 - Hadoop 2.6
 - Hadoop 2.7


On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>
> +1 to what Robert said.
>
> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0  (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.
>
> The easiest approach  to resolve this issue would be  (a) dropping the support for Hadoop 2.2.0
> An alternative approach (b) would be:
>  - ship a binary version for Hadoop 2.3.0
>  - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
>
> Cheers,
> Till
>






Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Maximilian Michels
+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability
changes in Flink.

On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:

> I am good with that as well. Mind that we are not only dropping a binary
> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>
>
>
> Lets also reconfigure Travis to test
>
>  - Hadoop1
>  - Hadoop 2.3
>  - Hadoop 2.4
>  - Hadoop 2.6
>  - Hadoop 2.7
>
>
> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>
>> +1 for dropping Hadoop 2.2.0
>>
>> Regards,
>> Chiwan Park
>>
>> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>> >
>> > +1 to what Robert said.
>> >
>> > On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>> > wrote:
>> > I think most cloud providers moved beyond Hadoop 2.2.0.
>> > Google's Click-To-Deploy is on 2.4.1
>> > AWS EMR is on 2.6.0
>> >
>> > The situation for the distributions seems to be the following:
>> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>> >
>> > HDP 2.0  (October 2013) is using 2.2.0
>> > HDP 2.1 (April 2014) uses 2.4.0 already
>> >
>> > So both vendors and cloud providers are multiple releases away from
>> > Hadoop 2.2.0.
>> >
>> > Spark does not offer a binary distribution lower than 2.3.0.
>> >
>> > In addition to that, I don't think that the HDFS client in 2.2.0 is
>> > really usable in production environments. Users were reporting
>> > ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>> > sometimes.
>> >
>> > The easiest approach  to resolve this issue would be  (a) dropping the
>> > support for Hadoop 2.2.0
>> > An alternative approach (b) would be:
>> >  - ship a binary version for Hadoop 2.3.0
>> >  - make the source of Flink still compatible with 2.2.0, so that users
>> > can compile a Hadoop 2.2.0 version if needed.
>> >
>> > I would vote for approach (a).
>> >
>> >
>> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>> > wrote:
>> > While working on high availability (HA) for Flink's YARN execution I
>> > stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>> > 2.3.0, Hadoop introduced new functionality which is required for an
>> > efficient HA implementation. Therefore, I was wondering whether there is
>> > actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>> > by someone?
>> >
>> > Cheers,
>> > Till
>> >
>>
>>
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Matthias J. Sax-2
+1 for dropping

On 09/04/2015 11:04 AM, Maximilian Michels wrote:

> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
>
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:
>> I am good with that as well. Mind that we are not only dropping a binary
>> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>>
>>
>>
>> Lets also reconfigure Travis to test
>>
>>  - Hadoop1
>>  - Hadoop 2.3
>>  - Hadoop 2.4
>>  - Hadoop 2.6
>>  - Hadoop 2.7
>>
>>
>> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>>
>>> +1 for dropping Hadoop 2.2.0
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>>>>
>>>> +1 to what Robert said.
>>>>
>>>> On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>>>> wrote:
>>>> I think most cloud providers moved beyond Hadoop 2.2.0.
>>>> Google's Click-To-Deploy is on 2.4.1
>>>> AWS EMR is on 2.6.0
>>>>
>>>> The situation for the distributions seems to be the following:
>>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>>>>
>>>> HDP 2.0  (October 2013) is using 2.2.0
>>>> HDP 2.1 (April 2014) uses 2.4.0 already
>>>>
>>>> So both vendors and cloud providers are multiple releases away from
>>>> Hadoop 2.2.0.
>>>>
>>>> Spark does not offer a binary distribution lower than 2.3.0.
>>>>
>>>> In addition to that, I don't think that the HDFS client in 2.2.0 is
>>>> really usable in production environments. Users were reporting
>>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>>>> sometimes.
>>>>
>>>> The easiest approach  to resolve this issue would be  (a) dropping the
>>>> support for Hadoop 2.2.0
>>>> An alternative approach (b) would be:
>>>>  - ship a binary version for Hadoop 2.3.0
>>>>  - make the source of Flink still compatible with 2.2.0, so that users
>>>> can compile a Hadoop 2.2.0 version if needed.
>>>>
>>>> I would vote for approach (a).
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>> While working on high availability (HA) for Flink's YARN execution I
>>>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>>>> 2.3.0, Hadoop introduced new functionality which is required for an
>>>> efficient HA implementation. Therefore, I was wondering whether there is
>>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>>>> by someone?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>
>>>
>>>
>>>
>>>
>>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Usage of Hadoop 2.2.0

Aljoscha Krettek

On Fri, 4 Sep 2015 at 13:01 Matthias J. Sax <[hidden email]> wrote:
+1 for dropping

On 09/04/2015 11:04 AM, Maximilian Michels wrote:
> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
>
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:
>> I am good with that as well. Mind that we are not only dropping a binary
>> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>>
>>
>>
>> Lets also reconfigure Travis to test
>>
>>  - Hadoop1
>>  - Hadoop 2.3
>>  - Hadoop 2.4
>>  - Hadoop 2.6
>>  - Hadoop 2.7
>>
>>
>> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>>
>>> +1 for dropping Hadoop 2.2.0
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>>>>
>>>> +1 to what Robert said.
>>>>
>>>> On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>>>> wrote:
>>>> I think most cloud providers moved beyond Hadoop 2.2.0.
>>>> Google's Click-To-Deploy is on 2.4.1
>>>> AWS EMR is on 2.6.0
>>>>
>>>> The situation for the distributions seems to be the following:
>>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>>>>
>>>> HDP 2.0  (October 2013) is using 2.2.0
>>>> HDP 2.1 (April 2014) uses 2.4.0 already
>>>>
>>>> So both vendors and cloud providers are multiple releases away from
>>>> Hadoop 2.2.0.
>>>>
>>>> Spark does not offer a binary distribution lower than 2.3.0.
>>>>
>>>> In addition to that, I don't think that the HDFS client in 2.2.0 is
>>>> really usable in production environments. Users were reporting
>>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>>>> sometimes.
>>>>
>>>> The easiest approach  to resolve this issue would be  (a) dropping the
>>>> support for Hadoop 2.2.0
>>>> An alternative approach (b) would be:
>>>>  - ship a binary version for Hadoop 2.3.0
>>>>  - make the source of Flink still compatible with 2.2.0, so that users
>>>> can compile a Hadoop 2.2.0 version if needed.
>>>>
>>>> I would vote for approach (a).
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>> While working on high availability (HA) for Flink's YARN execution I
>>>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>>>> 2.3.0, Hadoop introduced new functionality which is required for an
>>>> efficient HA implementation. Therefore, I was wondering whether there is
>>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>>>> by someone?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>
>>>
>>>
>>>
>>>
>>