(DEPRECATED) Apache Flink User Mailing List archive.

Logback doesn't receive logs from job

Classic

List

Threaded

1 message

Guilherme Nobre

Logback doesn't receive logs from job

Hi all,

I have a Flink cluster (1.4.0) built from flink's docker image, with 1 job manager and 2 task managers. I'm trying to use logback isntead of log4j and as far as the cluster configurations goes, seems alright.

Following https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/best_practices.html I have removed log4j-1.2.xx.jar and sfl4j-log4j12-xxx.jar from lib/ and added logback-classic.jar, logback-cor.jar and log4j-over-slf4j.jar. This is a snippet from the logs:

2018-06-26 11:02:47.760 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login

2018-06-26 11:02:47.761 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login commit

2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - using local user:UnixPrincipal: flink

2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Using user: "UnixPrincipal: flink" with name flink

2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - User entry: "flink"

2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Assuming keytab is managed externally since logged in from subject.

2018-06-26 11:02:47.766 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - UGI loginUser:flink (auth:SIMPLE)

2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Current Hadoop/Kerberos user: flink

2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.151-b12

2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 981 MiBytes

2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: /docker-java-home/jre

2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Hadoop version: 2.8.0

2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options:

2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms1024m

2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx1024m

2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments:

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - /opt/flink/conf

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - cluster

2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /opt/flink/lib/flink-python_2.11-1.4.0.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.4.0.jar:/opt/flink/lib/log4j-over-slf4j.jar:/opt/flink/lib/logback-classic.jar:/opt/flink/lib/logback-core.jar:/opt/flink/lib/flink-dist_2.11-1.4.0.jar:::

Above you can see the logback jars are properly in the classpath and that is using logback-console.xml as the configuration file. This is the content logback-console.xml:

<configuration>

<appender name="console" class="ch.qos.logback.core.ConsoleAppender">

<encoder>

<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern>

</encoder>

</appender>

<root level="DEBUG">

<appender-ref ref="console"/>

</root>

</configuration>

Which justify the DEBUG logs seem above as well, setting the level to DEBUG is working fine. Now comes the weirdest part. This is part of the pipeline:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Pipeline {

    private static final Logger LOGGER = LoggerFactory.getLogger(Pipeline.class);

    public static void main(String[] args) throws Exception {

        LOGGER.info("**************************************************** INFO");
        LOGGER.warn("**************************************************** WARN");
        LOGGER.error("**************************************************** ERROR");
        LOGGER.debug("**************************************************** DEBUG");
        LOGGER.trace("**************************************************** TRACE");

As you can see, I'm important slf4j logger and logger factory and instantiating the logger. Then, I just log all 5 levels.

I can't see ANY of these logs in the job manager / task manager logs. They work fine locally though:

13:16:11.030 [main] INFO Pipeline - **************************************************** INFO

13:16:11.035 [main] WARN Pipeline - **************************************************** WARN

13:16:16.239 [main] ERROR Pipeline - **************************************************** ERROR

13:16:16.239 [main] DEBUG Pipeline - **************************************************** DEBUG

It's not the log level as I can clearly see DEBUG and INFO logs from Flink, and the logback-console.xml is alright. The logs are not getting to the cluster for some reason.

Then I tried this:

ch.qos.logback.classic.Logger testLog = (ch.qos.logback.classic.Logger)LOGGER;
testLog.info("**************************************************** INFO");
testLog.warn("**************************************************** WARN");
testLog.error("**************************************************** ERROR");
testLog.debug("**************************************************** DEBUG");
testLog.trace("**************************************************** TRACE");

if(LOGGER == null) {
    throw new Exception("*************  logger is NULL");
} else if (LOGGER != null) {
    throw new Exception("*************  logger is not NULL " + LOGGER.getName()) LOGGER.getClass().;
}

and got this:

2018-06-26 11:24:27.268 [jobmanager-future-thread-5] DEBUG org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler - Error while handling request.

java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not run the jar.

at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)

at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.flink.util.FlinkException: Could not run the jar.

... 9 common frames omitted

Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:542)

at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)

at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)

at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)

at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:76)

at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)

... 8 common frames omitted

Caused by: java.lang.Exception: ************* logger is not NULL DEBUG 1656864061

at io.intellisense.MongoPipeline.main(MongoPipeline.java:79)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

... 13 common frames omitted

This means the logger was successfully set and received the DEBUG level as the conf/logback-console.yml and again, I was only able to see this error because it came from org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler, not from my pipeline.

I'm running out of options, does anyone have any idea?