Hi All,
I have some problems using Flink on Amazon EMR cluster. Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager. I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear. Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient. As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem? Regards, Chiwan Park |
Hi everyone,
Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again. I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned? Cheers, Max > Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>: > > Hi All, > > I have some problems using Flink on Amazon EMR cluster. > > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager. > > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear. > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient. > > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem? > > Regards, > Chiwan Park > > signature.asc (465 bytes) Download Attachment |
Hi! Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit. Concerning (2) Do you know which component in Flink uses the HTTP client? Greetings, Stephan On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote: Hi everyone, |
Hi,
Thanks for answering me! It is happy to hear the problem will be addressed. :) About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs. > On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote: > > Hi! > > Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit. > > Concerning (2) Do you know which component in Flink uses the HTTP client? > > Greetings, > Stephan > > > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote: > Hi everyone, > > Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again. > > I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned? > > Cheers, > Max > > > Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>: > > > > Hi All, > > > > I have some problems using Flink on Amazon EMR cluster. > > > > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager. > > > > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear. > > > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient. > > > > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem? > > > > Regards, > > Chiwan Park > > > > Regards, Chiwan Park |
At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency... On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote: Hi, |
Great! Thanks for addressing!
> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[hidden email]> wrote: > > At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency... > > On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote: > Hi, > > Thanks for answering me! > > It is happy to hear the problem will be addressed. :) > > About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs. > > > On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote: > > > > Hi! > > > > Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit. > > > > Concerning (2) Do you know which component in Flink uses the HTTP client? > > > > Greetings, > > Stephan > > > > > > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote: > > Hi everyone, > > > > Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again. > > > > I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned? > > > > Cheers, > > Max > > > > > Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>: > > > > > > Hi All, > > > > > > I have some problems using Flink on Amazon EMR cluster. > > > > > > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager. > > > > > > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear. > > > > > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient. > > > > > > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem? > > > > > > Regards, > > > Chiwan Park > > > > > > > > Regards, > Chiwan Park Regards, Chiwan Park |
@Stephan: It was added to the dependency management section in order to enforce a higher version for S3 client, because it was causing problems earlier.
> On 06 Jan 2016, at 11:14, Chiwan Park <[hidden email]> wrote: > > Great! Thanks for addressing! > >> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[hidden email]> wrote: >> >> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency... >> >> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[hidden email]> wrote: >> Hi, >> >> Thanks for answering me! >> >> It is happy to hear the problem will be addressed. :) >> >> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs. >> >>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> Hi! >>> >>> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit. >>> >>> Concerning (2) Do you know which component in Flink uses the HTTP client? >>> >>> Greetings, >>> Stephan >>> >>> >>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <[hidden email]> wrote: >>> Hi everyone, >>> >>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again. >>> >>> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned? >>> >>> Cheers, >>> Max >>> >>>> Am 04.01.2016 um 12:52 schrieb Chiwan Park <[hidden email]>: >>>> >>>> Hi All, >>>> >>>> I have some problems using Flink on Amazon EMR cluster. >>>> >>>> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager. >>>> >>>> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear. >>>> >>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient. >>>> >>>> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem? >>>> >>>> Regards, >>>> Chiwan Park >>>> >>>> >> >> Regards, >> Chiwan Park > > Regards, > Chiwan Park > > |
Would it cause problems if I remove it from the "flink-runtime" pom? Seems strange to have a dependency there that we do not even use... On Wed, Jan 6, 2016 at 12:07 PM, Ufuk Celebi <[hidden email]> wrote: @Stephan: It was added to the dependency management section in order to enforce a higher version for S3 client, because it was causing problems earlier. |
Free forum by Nabble | Edit this page |