Hi,
Does any have any idea on the following error msg: (it flooded my task manager log) I do have datadog metrics present so this is probably only happens for some metrics. 2020-06-24 03:27:15,362 WARN org.apache.flink.metrics.datadog.DatadogHttpClient - Failed sending request to Datadog java.net.SocketTimeoutException: timeout at org.apache.flink.shaded.okio.Okio$4.newTimeoutException(Okio.java:227) at org.apache.flink.shaded.okio.AsyncTimeout.exit(AsyncTimeout.java:284) at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:240) at org.apache.flink.shaded.okio.RealBufferedSource.indexOf(RealBufferedSource.java:344) at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:216) at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210) at org.apache.flink.shaded.okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) at org.apache.flink.shaded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at org.apache.flink.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at org.apache.flink.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:204) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.flink.shaded.okio.Okio$2.read(Okio.java:138) at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:236) ... 23 more |
原始邮件 发件人: seeksst<[hidden email]> 收件人: Fanbin Bu<[hidden email]> 发送时间: 2020年6月26日(周五) 23:36 主题: Re: datadog failed to send report Hi, I’m sorry for not explaining it clearly and misread the exception. log4j.logger.org.apache.flink.metrics.datadog.DatadogHttpClient=ERROR log4j.logger.org.apache.flink.runtime.metrics will not work on flink.metrics, it effect on flink.runtime.metrics。 if it does work again, you can see that there are many log profiles in the folder /conf. Modifying config is helpful to control the log output. If it doesn’t work,may be log4j.properties is not being used. You can read this artical for answers[1]. If you’re still not sure, you can change all. A more granular configuration is recommended. I’m not familiar with datadog (I use influxdb to collect metrics). but i think if it can collect metrics, and network is not a problem, the bottleneck may be processing the request but not sure. SocketTimeoutException can occur in serveral situations: 1.the network is down you think the network is ok 2.server processing is slow datadog may deal many requests, and can’t answer fast. you can check cpu usage of the datadog machine. Sometimes it depends on the program, if it use one thread deal all request(this is something that i don’t know about datadog).if cup usage is high, this may be reason, if not, need know about datadog. 3.slow network transmission you need check network,whether the network traffic is full or the machine physical location is far away. you can also find ways to adjust the timeout. 4.your job frequently triggered full gc. you can check gc log, this need to edit flink-conf.yml something like : env.java.opts.taskmanager: -Xloggc:<LOG_DIR>/taskmanager-gc.log
Best wish to you. [1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/logging.html 原始邮件 发件人: Fanbin Bu<[hidden email]> 收件人: seeksst<[hidden email]> 发送时间: 2020年6月26日(周五) 05:38 主题: Re: datadog failed to send report this does not help. log4j.logger.org.apache.flink.runtime.metrics=ERROR
i believe all machines can telnet datadog port since there are other metrics reported correctly. how do i check the number of requests capacity? On Tue, Jun 23, 2020 at 11:32 PM seeksst <[hidden email]> wrote:
|
Hi, could this be another symptom of this issue: https://issues.apache.org/jira/browse/FLINK-16611? I guess you'll have to ask DataDog to check at their end, maybe you are running into some rate limit there? On Fri, Jun 26, 2020 at 5:42 PM seeksst <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |