(DEPRECATED) Apache Flink User Mailing List archive.

Flink on EMR Question

Classic

List

Threaded

8 messages Options

Chiwan Park-2

Flink on EMR Question

Hi All,

I have some problems using Flink on Amazon EMR cluster.

Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.

I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.

Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.

As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?

Regards,
Chiwan Park

Maximilian Bode

Re: Flink on EMR Question

Hi everyone,

Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.

I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?

Cheers,
Max

> Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
>
> Hi All,
>
> I have some problems using Flink on Amazon EMR cluster.
>
> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>
> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>
> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>
> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>
> Regards,
> Chiwan Park
>
>

signature.asc (465 bytes) Download Attachment

Stephan Ewen

Re: Flink on EMR Question

Hi!

Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.

Concerning (2) Do you know which component in Flink uses the HTTP client?

Greetings,

Stephan

On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:

Hi everyone,

Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.

I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?

Cheers,
Max

> Am <a href="tel:04.01.2016" value="+494012016">04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
>
> Hi All,
>
> I have some problems using Flink on Amazon EMR cluster.
>
> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>
> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>
> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>
> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>
> Regards,
> Chiwan Park
>
>

Chiwan Park-2

Re: Flink on EMR Question

Hi,

Thanks for answering me!

It is happy to hear the problem will be addressed. :)

About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.

> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote:
>
> Hi!
>
> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
>
> Concerning (2) Do you know which component in Flink uses the HTTP client?
>
> Greetings,
> Stephan
>
>
> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:
> Hi everyone,
>
> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
>
> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
>
> Cheers,
> Max
>
> > Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
> >
> > Hi All,
> >
> > I have some problems using Flink on Amazon EMR cluster.
> >
> > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> >
> > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> >
> > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> >
> > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> >
> > Regards,
> > Chiwan Park
> >
> >

Regards,
Chiwan Park

Stephan Ewen

Re: Flink on EMR Question

At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...

On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote:

Hi,

Thanks for answering me!

It is happy to hear the problem will be addressed. :)

About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.

> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote:
>
> Hi!
>
> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
>
> Concerning (2) Do you know which component in Flink uses the HTTP client?
>
> Greetings,
> Stephan
>
>
> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:
> Hi everyone,
>
> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
>
> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
>
> Cheers,
> Max
>
> > Am <a href="tel:04.01.2016" value="+494012016">04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
> >
> > Hi All,
> >
> > I have some problems using Flink on Amazon EMR cluster.
> >
> > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> >
> > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> >
> > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> >
> > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> >
> > Regards,
> > Chiwan Park
> >
> >

Regards,
Chiwan Park

Chiwan Park-2

Re: Flink on EMR Question

Great! Thanks for addressing!

> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[hidden email]> wrote:
>
> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...
>
> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote:
> Hi,
>
> Thanks for answering me!
>
> It is happy to hear the problem will be addressed. :)
>
> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.
>
> > On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > Hi!
> >
> > Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
> >
> > Concerning (2) Do you know which component in Flink uses the HTTP client?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:
> > Hi everyone,
> >
> > Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
> >
> > I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
> >
> > Cheers,
> > Max
> >
> > > Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
> > >
> > > Hi All,
> > >
> > > I have some problems using Flink on Amazon EMR cluster.
> > >
> > > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> > >
> > > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> > >
> > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> > >
> > > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > >
>
> Regards,
> Chiwan Park

Regards,
Chiwan Park

Ufuk Celebi

Re: Flink on EMR Question

@Stephan: It was added to the dependency management section in order to enforce a higher version for S3 client, because it was causing problems earlier.

> On 06 Jan 2016, at 11:14, Chiwan Park <[hidden email]> wrote:
>
> Great! Thanks for addressing!
>
>> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...
>>
>> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote:
>> Hi,
>>
>> Thanks for answering me!
>>
>> It is happy to hear the problem will be addressed. :)
>>
>> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.
>>
>>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
>>>
>>> Concerning (2) Do you know which component in Flink uses the HTTP client?
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:
>>> Hi everyone,
>>>
>>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
>>>
>>> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
>>>
>>> Cheers,
>>> Max
>>>
>>>> Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
>>>>
>>>> Hi All,
>>>>
>>>> I have some problems using Flink on Amazon EMR cluster.
>>>>
>>>> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>>>>
>>>> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>>>>
>>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>>>>
>>>> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>>>>
>>>> Regards,
>>>> Chiwan Park
>>>>
>>>>
>>
>> Regards,
>> Chiwan Park
>
> Regards,
> Chiwan Park
>
>

Stephan Ewen

Re: Flink on EMR Question

Would it cause problems if I remove it from the "flink-runtime" pom?

Seems strange to have a dependency there that we do not even use...

On Wed, Jan 6, 2016 at 12:07 PM, Ufuk Celebi <[hidden email]> wrote:

@Stephan: It was added to the dependency management section in order to enforce a higher version for S3 client, because it was causing problems earlier.

> On 06 Jan 2016, at 11:14, Chiwan Park <[hidden email]> wrote:
>
> Great! Thanks for addressing!
>
>> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...
>>
>> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote:
>> Hi,
>>
>> Thanks for answering me!
>>
>> It is happy to hear the problem will be addressed. :)
>>
>> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.
>>
>>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
>>>
>>> Concerning (2) Do you know which component in Flink uses the HTTP client?
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote:
>>> Hi everyone,
>>>
>>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
>>>
>>> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
>>>
>>> Cheers,
>>> Max
>>>
>>>> Am <a href="tel:04.01.2016" value="+494012016">04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>:
>>>>
>>>> Hi All,
>>>>
>>>> I have some problems using Flink on Amazon EMR cluster.
>>>>
>>>> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>>>>
>>>> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>>>>
>>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>>>>
>>>> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>>>>
>>>> Regards,
>>>> Chiwan Park
>>>>
>>>>
>>
>> Regards,
>> Chiwan Park
>
> Regards,
> Chiwan Park
>
>