Logback doesn't receive logs from job

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Logback doesn't receive logs from job

Guilherme Nobre
Hi all,

I have a Flink cluster (1.4.0) built from flink's docker image, with 1 job manager and 2 task managers. I'm trying to use logback isntead of log4j and as far as the cluster configurations goes, seems alright.

Following https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/best_practices.html I have removed log4j-1.2.xx.jar and sfl4j-log4j12-xxx.jar from lib/ and added logback-classic.jar, logback-cor.jar and log4j-over-slf4j.jar. This is a snippet from the logs:

2018-06-26 11:02:47.760 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - hadoop login
2018-06-26 11:02:47.761 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - hadoop login commit
2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - using local user:UnixPrincipal: flink
2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - Using user: "UnixPrincipal: flink" with name flink
2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - User entry: "flink"
2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - Assuming keytab is managed externally since logged in from subject.
2018-06-26 11:02:47.766 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - UGI loginUser:flink (auth:SIMPLE)
2018-06-26 11:02:47.766 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  Current Hadoop/Kerberos user: flink
2018-06-26 11:02:47.766 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.151-b12
2018-06-26 11:02:47.766 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  Maximum heap size: 981 MiBytes
2018-06-26 11:02:47.766 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  JAVA_HOME: /docker-java-home/jre
2018-06-26 11:02:47.771 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  Hadoop version: 2.8.0
2018-06-26 11:02:47.771 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  JVM Options:
2018-06-26 11:02:47.771 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     -Xms1024m
2018-06-26 11:02:47.771 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     -Xmx1024m
2018-06-26 11:02:47.771 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  Program Arguments:
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     --configDir
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     /opt/flink/conf
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     --executionMode
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -     cluster
2018-06-26 11:02:47.772 [main] INFO  org.apache.flink.runtime.jobmanager.JobManager  -  Classpath: /opt/flink/lib/flink-python_2.11-1.4.0.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.4.0.jar:/opt/flink/lib/log4j-over-slf4j.jar:/opt/flink/lib/logback-classic.jar:/opt/flink/lib/logback-core.jar:/opt/flink/lib/flink-dist_2.11-1.4.0.jar:::

Above you can see the logback jars are properly in the classpath and that is using logback-console.xml as the configuration file. This is the content logback-console.xml:

<configuration>
    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern>
        </encoder>
    </appender>

    <root level="DEBUG">
        <appender-ref ref="console"/>
    </root>
</configuration>

Which justify the DEBUG logs seem above as well, setting the level to DEBUG is working fine. Now comes the weirdest part. This is part of the pipeline:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Pipeline {

private static final Logger LOGGER = LoggerFactory.getLogger(Pipeline.class);

public static void main(String[] args) throws Exception {

LOGGER.info("**************************************************** INFO");
LOGGER.warn("**************************************************** WARN");
LOGGER.error("**************************************************** ERROR");
LOGGER.debug("**************************************************** DEBUG");
LOGGER.trace("**************************************************** TRACE");
As you can see, I'm important slf4j logger and logger factory and instantiating the logger. Then, I just log all 5 levels.

I can't see ANY of these logs in the job manager / task manager logs. They work fine locally though:

13:16:11.030 [main] INFO Pipeline - **************************************************** INFO
13:16:11.035 [main] WARN Pipeline - **************************************************** WARN
13:16:16.239 [main] ERROR Pipeline - **************************************************** ERROR
13:16:16.239 [main] DEBUG Pipeline - **************************************************** DEBUG

It's not the log level as I can clearly see DEBUG and INFO logs from Flink, and the logback-console.xml is alright. The logs are not getting to the cluster for some reason.
Then I tried this:
ch.qos.logback.classic.Logger testLog = (ch.qos.logback.classic.Logger)LOGGER;
testLog.info("**************************************************** INFO");
testLog.warn("**************************************************** WARN");
testLog.error("**************************************************** ERROR");
testLog.debug("**************************************************** DEBUG");
testLog.trace("**************************************************** TRACE");

if(LOGGER == null) {
throw new Exception("************* logger is NULL");
} else if (LOGGER != null) {
throw new Exception("************* logger is not NULL " + LOGGER.getName()) LOGGER.getClass().;
}
and got this:

2018-06-26 11:24:27.268 [jobmanager-future-thread-5] DEBUG org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler  - Error while handling request.
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not run the jar.
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not run the jar.
... 9 common frames omitted
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:542)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)
at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:76)
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)
... 8 common frames omitted
Caused by: java.lang.Exception: *************  logger is not NULL DEBUG 1656864061
at io.intellisense.MongoPipeline.main(MongoPipeline.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
... 13 common frames omitted

This means the logger was successfully set and received the DEBUG level as the conf/logback-console.yml and again, I was only able to see this error because it came from org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler, not from my pipeline.
I'm running out of options, does anyone have any idea?