(DEPRECATED) Apache Flink User Mailing List archive.

Usage of Hadoop 2.2.0

Classic

List

Threaded

8 messages Options

Till Rohrmann

Usage of Hadoop 2.2.0

While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till

rmetzger0

Re: Usage of Hadoop 2.2.0

I think most cloud providers moved beyond Hadoop 2.2.0.

Google's Click-To-Deploy is on 2.4.1

AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:

MapR 4 uses Hadoop 2.4.0 (current is MapR 5)

CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0 (October 2013) is using 2.2.0

HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.

The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0

An alternative approach (b) would be:

- ship a binary version for Hadoop 2.3.0

- make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).

On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:

While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till

Ufuk Celebi

Re: Usage of Hadoop 2.2.0

+1 to what Robert said.

On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:

I think most cloud providers moved beyond Hadoop 2.2.0.
Google's Click-To-Deploy is on 2.4.1
AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:
MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0 (October 2013) is using 2.2.0
HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.

The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0
An alternative approach (b) would be:
- ship a binary version for Hadoop 2.3.0
- make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).

On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <<a href="javascript:_e(%7B%7D,'cvml','trohrmann@apache.org');" target="_blank">trohrmann@...> wrote:
While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?

Cheers,
Till

Chiwan Park-2

Re: Usage of Hadoop 2.2.0

+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>
> +1 to what Robert said.
>
> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0 (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.
>
> The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0
> An alternative approach (b) would be:
> - ship a binary version for Hadoop 2.3.0
> - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
>
> Cheers,
> Till
>

Stephan Ewen

Re: Usage of Hadoop 2.2.0

I am good with that as well. Mind that we are not only dropping a binary distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.

Lets also reconfigure Travis to test

- Hadoop1

- Hadoop 2.3

- Hadoop 2.4

- Hadoop 2.6

- Hadoop 2.7

On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:

+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>
> +1 to what Robert said.
>
> On Thursday, September 3, 2015, Robert Metzger <[hidden email]> wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0 (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really usable in production environments. Users were reporting ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions sometimes.
>
> The easiest approach to resolve this issue would be (a) dropping the support for Hadoop 2.2.0
> An alternative approach (b) would be:
> - ship a binary version for Hadoop 2.3.0
> - make the source of Flink still compatible with 2.2.0, so that users can compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]> wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, Hadoop introduced new functionality which is required for an efficient HA implementation. Therefore, I was wondering whether there is actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
>
> Cheers,
> Till
>

Maximilian Michels

Re: Usage of Hadoop 2.2.0

+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability
changes in Flink.

On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:

> I am good with that as well. Mind that we are not only dropping a binary
> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>
>
>
> Lets also reconfigure Travis to test
>
> - Hadoop1
> - Hadoop 2.3
> - Hadoop 2.4
> - Hadoop 2.6
> - Hadoop 2.7
>
>
> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>
>> +1 for dropping Hadoop 2.2.0
>>
>> Regards,
>> Chiwan Park
>>
>> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>> >
>> > +1 to what Robert said.
>> >
>> > On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>> > wrote:
>> > I think most cloud providers moved beyond Hadoop 2.2.0.
>> > Google's Click-To-Deploy is on 2.4.1
>> > AWS EMR is on 2.6.0
>> >
>> > The situation for the distributions seems to be the following:
>> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>> >
>> > HDP 2.0 (October 2013) is using 2.2.0
>> > HDP 2.1 (April 2014) uses 2.4.0 already
>> >
>> > So both vendors and cloud providers are multiple releases away from
>> > Hadoop 2.2.0.
>> >
>> > Spark does not offer a binary distribution lower than 2.3.0.
>> >
>> > In addition to that, I don't think that the HDFS client in 2.2.0 is
>> > really usable in production environments. Users were reporting
>> > ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>> > sometimes.
>> >
>> > The easiest approach to resolve this issue would be (a) dropping the
>> > support for Hadoop 2.2.0
>> > An alternative approach (b) would be:
>> > - ship a binary version for Hadoop 2.3.0
>> > - make the source of Flink still compatible with 2.2.0, so that users
>> > can compile a Hadoop 2.2.0 version if needed.
>> >
>> > I would vote for approach (a).
>> >
>> >
>> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>> > wrote:
>> > While working on high availability (HA) for Flink's YARN execution I
>> > stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>> > 2.3.0, Hadoop introduced new functionality which is required for an
>> > efficient HA implementation. Therefore, I was wondering whether there is
>> > actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>> > by someone?
>> >
>> > Cheers,
>> > Till
>> >
>>
>>
>>
>>
>>
>

Matthias J. Sax-2

Re: Usage of Hadoop 2.2.0

+1 for dropping

On 09/04/2015 11:04 AM, Maximilian Michels wrote:

> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
>
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:
>> I am good with that as well. Mind that we are not only dropping a binary
>> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>>
>>
>>
>> Lets also reconfigure Travis to test
>>
>> - Hadoop1
>> - Hadoop 2.3
>> - Hadoop 2.4
>> - Hadoop 2.6
>> - Hadoop 2.7
>>
>>
>> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>>
>>> +1 for dropping Hadoop 2.2.0
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>>>>
>>>> +1 to what Robert said.
>>>>
>>>> On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>>>> wrote:
>>>> I think most cloud providers moved beyond Hadoop 2.2.0.
>>>> Google's Click-To-Deploy is on 2.4.1
>>>> AWS EMR is on 2.6.0
>>>>
>>>> The situation for the distributions seems to be the following:
>>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>>>>
>>>> HDP 2.0 (October 2013) is using 2.2.0
>>>> HDP 2.1 (April 2014) uses 2.4.0 already
>>>>
>>>> So both vendors and cloud providers are multiple releases away from
>>>> Hadoop 2.2.0.
>>>>
>>>> Spark does not offer a binary distribution lower than 2.3.0.
>>>>
>>>> In addition to that, I don't think that the HDFS client in 2.2.0 is
>>>> really usable in production environments. Users were reporting
>>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>>>> sometimes.
>>>>
>>>> The easiest approach to resolve this issue would be (a) dropping the
>>>> support for Hadoop 2.2.0
>>>> An alternative approach (b) would be:
>>>> - ship a binary version for Hadoop 2.3.0
>>>> - make the source of Flink still compatible with 2.2.0, so that users
>>>> can compile a Hadoop 2.2.0 version if needed.
>>>>
>>>> I would vote for approach (a).
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>> While working on high availability (HA) for Flink's YARN execution I
>>>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>>>> 2.3.0, Hadoop introduced new functionality which is required for an
>>>> efficient HA implementation. Therefore, I was wondering whether there is
>>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>>>> by someone?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>
>>>
>>>
>>>
>>>
>>

signature.asc (836 bytes) Download Attachment

Aljoscha Krettek

Re: Usage of Hadoop 2.2.0

I created a Jira for this: https://issues.apache.org/jira/browse/FLINK-2643

On Fri, 4 Sep 2015 at 13:01 Matthias J. Sax <[hidden email]> wrote:

+1 for dropping

On 09/04/2015 11:04 AM, Maximilian Michels wrote:
> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
>
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <[hidden email]> wrote:
>> I am good with that as well. Mind that we are not only dropping a binary
>> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>>
>>
>>
>> Lets also reconfigure Travis to test
>>
>> - Hadoop1
>> - Hadoop 2.3
>> - Hadoop 2.4
>> - Hadoop 2.6
>> - Hadoop 2.7
>>
>>
>> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <[hidden email]> wrote:
>>>
>>> +1 for dropping Hadoop 2.2.0
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <[hidden email]> wrote:
>>>>
>>>> +1 to what Robert said.
>>>>
>>>> On Thursday, September 3, 2015, Robert Metzger <[hidden email]>
>>>> wrote:
>>>> I think most cloud providers moved beyond Hadoop 2.2.0.
>>>> Google's Click-To-Deploy is on 2.4.1
>>>> AWS EMR is on 2.6.0
>>>>
>>>> The situation for the distributions seems to be the following:
>>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>>>>
>>>> HDP 2.0 (October 2013) is using 2.2.0
>>>> HDP 2.1 (April 2014) uses 2.4.0 already
>>>>
>>>> So both vendors and cloud providers are multiple releases away from
>>>> Hadoop 2.2.0.
>>>>
>>>> Spark does not offer a binary distribution lower than 2.3.0.
>>>>
>>>> In addition to that, I don't think that the HDFS client in 2.2.0 is
>>>> really usable in production environments. Users were reporting
>>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>>>> sometimes.
>>>>
>>>> The easiest approach to resolve this issue would be (a) dropping the
>>>> support for Hadoop 2.2.0
>>>> An alternative approach (b) would be:
>>>> - ship a binary version for Hadoop 2.3.0
>>>> - make the source of Flink still compatible with 2.2.0, so that users
>>>> can compile a Hadoop 2.2.0 version if needed.
>>>>
>>>> I would vote for approach (a).
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>> While working on high availability (HA) for Flink's YARN execution I
>>>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>>>> 2.3.0, Hadoop introduced new functionality which is required for an
>>>> efficient HA implementation. Therefore, I was wondering whether there is
>>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used
>>>> by someone?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>
>>>
>>>
>>>
>>>
>>