EMR Logging Woes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

EMR Logging Woes

Rex Fenley
Hello,

After lots of testing in local environments we're now trying to get our cluster running on AWS EMR. We followed much of the documentation from both AWS and Flink and have gotten to the point of creating a yarn session and submitting jobs. We successfully get back a Job ID and in the Yarn Timeline Server UI it says our application is running. However, we are having a hard time with logging.

2 main issues:
1. Logs for the jobmanager and taskmanager seem to take a long time to show up or in some cases just seem to never show up in the Yarn / Hadoop UI, even though we can see them just fine when ssh'ing into the cluster's nodes. Anything we can do to speed this up?

2. We can't seem to see anything except for WARN and ERROR logs for the jobmanager and taskmanager, we need at least INFO right now to confirm things are working as expected. We have been jumping through hoops going through a multitude of configuration files including log4j-session.properties and log4j.properties setting level to DEBUG but it has not helped. Are these the correct configuration files?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: EMR Logging Woes

rmetzger0
Hi Rex,

1. You can also use the Flink UI for retrieving logs. That usually works quite fast (unless your logs are huge).

2. These are the correct configuration files for setting the log level. Are you running on a vanilla EMR cluster, or are there modifications? The "problem" is that Flink on YARN adds jar files (and other files) provided by the environment (YARN) to its classpath. The vanilla EMR configuration should be fine to not interfere with Flink's logging. But maybe there are some changes in your environment that cause problems?

Since you are SSHing into the machines already: At the top of each Flink log file, we are logging the location of the log4j configuration file (search for "-Dlog4j.configuration="). Try to open that file to verify what's in there.

Hope this helps!

Robert


On Tue, Oct 27, 2020 at 12:03 AM Rex Fenley <[hidden email]> wrote:
Hello,

After lots of testing in local environments we're now trying to get our cluster running on AWS EMR. We followed much of the documentation from both AWS and Flink and have gotten to the point of creating a yarn session and submitting jobs. We successfully get back a Job ID and in the Yarn Timeline Server UI it says our application is running. However, we are having a hard time with logging.

2 main issues:
1. Logs for the jobmanager and taskmanager seem to take a long time to show up or in some cases just seem to never show up in the Yarn / Hadoop UI, even though we can see them just fine when ssh'ing into the cluster's nodes. Anything we can do to speed this up?

2. We can't seem to see anything except for WARN and ERROR logs for the jobmanager and taskmanager, we need at least INFO right now to confirm things are working as expected. We have been jumping through hoops going through a multitude of configuration files including log4j-session.properties and log4j.properties setting level to DEBUG but it has not helped. Are these the correct configuration files?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: EMR Logging Woes

Rex Fenley
Thanks! I'll check these out.

On Tue, Oct 27, 2020 at 12:58 AM Robert Metzger <[hidden email]> wrote:
Hi Rex,

1. You can also use the Flink UI for retrieving logs. That usually works quite fast (unless your logs are huge).

2. These are the correct configuration files for setting the log level. Are you running on a vanilla EMR cluster, or are there modifications? The "problem" is that Flink on YARN adds jar files (and other files) provided by the environment (YARN) to its classpath. The vanilla EMR configuration should be fine to not interfere with Flink's logging. But maybe there are some changes in your environment that cause problems?

Since you are SSHing into the machines already: At the top of each Flink log file, we are logging the location of the log4j configuration file (search for "-Dlog4j.configuration="). Try to open that file to verify what's in there.

Hope this helps!

Robert


On Tue, Oct 27, 2020 at 12:03 AM Rex Fenley <[hidden email]> wrote:
Hello,

After lots of testing in local environments we're now trying to get our cluster running on AWS EMR. We followed much of the documentation from both AWS and Flink and have gotten to the point of creating a yarn session and submitting jobs. We successfully get back a Job ID and in the Yarn Timeline Server UI it says our application is running. However, we are having a hard time with logging.

2 main issues:
1. Logs for the jobmanager and taskmanager seem to take a long time to show up or in some cases just seem to never show up in the Yarn / Hadoop UI, even though we can see them just fine when ssh'ing into the cluster's nodes. Anything we can do to speed this up?

2. We can't seem to see anything except for WARN and ERROR logs for the jobmanager and taskmanager, we need at least INFO right now to confirm things are working as expected. We have been jumping through hoops going through a multitude of configuration files including log4j-session.properties and log4j.properties setting level to DEBUG but it has not helped. Are these the correct configuration files?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US