Re: Negative values using latency marker

Posted by Nico Kruber on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Negative-values-using-latency-marker-tp16522p16560.html

Ok, digging into it a bit further:

The LatencyMarker is scheduled at a certain period with some initialDelay. Its
initial time is `System.currentTimeMillis() + initialDelay` (when it should
first be run). Depending on your system's load, this run may actually be
delayed (but then the marker's time will fall behind, not explaining a
negative value) but from the Executor's documentation, I don't think, it
should execute it too early. For future markers, their time will simply be
increased by the period (which may fall behind for the same reason).

Before emitting the metric, the difference to `System.currentTimeMillis()`
will be used which is based on system time and may decrease if the clock is
adjusted, e.g. via NTP. Also, this is probably called from a different thread
and `System.currentTimeMillis()` apparently may jump backwards there as well
[1].


Nico

[1] https://stackoverflow.com/questions/2978598/will-system-currenttimemillis-always-return-a-value-previous-calls


On Sunday, 5 November 2017 09:22:05 CET Sofer, Tovi  wrote:

> Hi Nico,
>
> Actually the run below is on my local machine, and both Kafka and flink run
> on it.
>
> Thanks and regards,
> Tovi
> -----Original Message-----
> From: Nico Kruber [mailto:[hidden email]]
> Sent: יום ו 03 נובמבר 2017 15:22
> To: [hidden email]
> Cc: Sofer, Tovi [ICG-IT] <[hidden email]>
> Subject: Re: Negative values using latency marker
>
> Hi Tovi,
> if I see this correctly, the LatencyMarker gets its initial timstamp during
> creation at the source and the latency is reported as a metric at a sink by
> comparing the initial timestamp with the current time. If the clocks
> between the two machines involved diverge, e.g. the sinks clock falling
> behind, the difference may be negative.
>
>
> Nico
>
> On Thursday, 2 November 2017 17:58:51 CET Sofer, Tovi  wrote:
> > Hi group,
> >
> > Can someone maybe elaborate how can latency gauge shown by latency
> > marker be negative?
> >
> > 2017-11-02 18:54:56,842 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Sink: FinalSink.0.latency: {LatencySourceDescriptor{vertexID=1,
> > subtaskIndex=0}={p99=-5.0, p50=-5.0, min=-5.0, max=-5.0, p95=-5.0,
> > mean=-5.0}, LatencySourceDescriptor{vertexID=1,
> > subtaskIndex=1}={p99=-5.0, p50=-5.0, min=-5.0, max=-5.0, p95=-5.0,
> > mean=-5.0}, LatencySourceDescriptor{vertexID=1,
> > subtaskIndex=2}={p99=-6.0, p50=-6.0, min=-6.0, max=-6.0, p95=-6.0,
> > mean=-6.0}, LatencySourceDescriptor{vertexID=1,
> > subtaskIndex=3}={p99=-6.0, p50=-6.0, min=-6.0, max=-6.0, p95=-6.0,
> > mean=-6.0}} 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.60SecWarmUpRecordsCounter: 2858446
> > 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Source: fixTopicConsumerSource.3.numRecordsOut: 1954784 2017-11-02
> > 18:54:56,843 INFO  com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.ActualRecordsCounter: 4962675
> > 2017-11-02
> > 18:54:56,843 INFO  com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.AverageLatencyMs: 0.0753785 2017-11-02
> > 18:54:56,843 INFO  com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.HighLatencyMsgPercentage: 0.5918576
> > 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Source: fixTopicConsumerSource.0.numRecordsOutPerSecond:
> > 12943.116666666667 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.numRecordsInPerSecond: 51751.4
> > 2017-11-02
> > 18:54:56,843 INFO  com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Source: fixTopicConsumerSource.3.numRecordsOutPerSecond: 12935.05
> > 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Source: fixTopicConsumerSource.2.numRecordsOutPerSecond:
> > 12946.916666666666 2017-11-02 18:54:56,843 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.Source: fixTopicConsumerSource.1.numRecordsOutPerSecond:
> > 12926.316666666668 2017-11-02 18:54:56,844 INFO
> > com.citi.artemis.flink.reporters.ArtemisReporter -
> > [Flink-MetricRegistry-1]
> > 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming
> > Job.AverageE2ELatencyChecker.0.LatencyHistogram: count:10000 min:24753
> > max:19199891 mean:77637.6484 stddev:341333.9414842662 p50:40752.0
> > p75:49809.0 p95:190480.95 p98:539110.8199999994 p99:749224.8899999995
> > p999:3817927.9259998496
> >
> > Regards,
> > Tovi


signature.asc (201 bytes) Download Attachment