End to End Latency Tracking in flink

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

End to End Latency Tracking in flink

Lu Niu
Hi,

I am looking for end to end latency monitoring of link job. Based on my study, I have two options:

1. flink provide a latency tracking feature. However, the documentation says it cannot show actual latency of business logic as it will bypass all operators. https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#latency-tracking Also, the feature can significantly impact the performance so I assume it's not for usage in production. What are users use the latency tracking for? Sounds like only back pressure could affect the latency. 

2. I found another stackoverflow question on this. https://stackoverflow.com/questions/56578919/latency-monitoring-in-flink-application . The answer suggestion to expose (current processing - the event time) after source and before sink for end to end latency monitoring. Is this a good solution? If not, What’s the official solution for end to end latency tracking? 

Thank you! 

Best
Lu

Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

Congxian Qiu
Hi
As far as I know, the latency-tracking feature is for debugging usages, you can use it to debug, and disable it when running the job on production.
From my side, use $current_processing - $event_time is something ok, but keep the things in mind: the event time may not be the time ingested in Flink.

Best,
Congxian


Lu Niu <[hidden email]> 于2020年3月28日周六 上午6:25写道:
Hi,

I am looking for end to end latency monitoring of link job. Based on my study, I have two options:

1. flink provide a latency tracking feature. However, the documentation says it cannot show actual latency of business logic as it will bypass all operators. https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#latency-tracking Also, the feature can significantly impact the performance so I assume it's not for usage in production. What are users use the latency tracking for? Sounds like only back pressure could affect the latency. 

2. I found another stackoverflow question on this. https://stackoverflow.com/questions/56578919/latency-monitoring-in-flink-application . The answer suggestion to expose (current processing - the event time) after source and before sink for end to end latency monitoring. Is this a good solution? If not, What’s the official solution for end to end latency tracking? 

Thank you! 

Best
Lu

Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

Zhijiang(wangzhijiang999)
Hi Lu,

Besides Congxian's replies, you can also get some further explanations from "https://flink.apache.org/2019/07/23/flink-network-stack-2.html#latency-tracking".

Best,
Zhijiang

------------------------------------------------------------------
From:Congxian Qiu <[hidden email]>
Send Time:2020 Mar. 28 (Sat.) 11:49
To:Lu Niu <[hidden email]>
Cc:user <[hidden email]>
Subject:Re: End to End Latency Tracking in flink

Hi
As far as I know, the latency-tracking feature is for debugging usages, you can use it to debug, and disable it when running the job on production.
From my side, use $current_processing - $event_time is something ok, but keep the things in mind: the event time may not be the time ingested in Flink.

Best,
Congxian


Lu Niu <[hidden email]> 于2020年3月28日周六 上午6:25写道:
Hi,

I am looking for end to end latency monitoring of link job. Based on my study, I have two options:

1. flink provide a latency tracking feature. However, the documentation says it cannot show actual latency of business logic as it will bypass all operators. https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#latency-tracking Also, the feature can significantly impact the performance so I assume it's not for usage in production. What are users use the latency tracking for? Sounds like only back pressure could affect the latency. 

2. I found another stackoverflow question on this. https://stackoverflow.com/questions/56578919/latency-monitoring-in-flink-application . The answer suggestion to expose (current processing - the event time) after source and before sink for end to end latency monitoring. Is this a good solution? If not, What’s the official solution for end to end latency tracking? 

Thank you! 

Best
Lu


Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

Lu Niu
Thanks for reply, @Zhijiang, @Congxian! 

@Congxian
$current_processing - $event_time works for event time. How about processing time? Is there a good way to measure the latency? 

Best
Lu

On Sun, Mar 29, 2020 at 6:21 AM Zhijiang <[hidden email]> wrote:
Hi Lu,

Besides Congxian's replies, you can also get some further explanations from "https://flink.apache.org/2019/07/23/flink-network-stack-2.html#latency-tracking".

Best,
Zhijiang

------------------------------------------------------------------
From:Congxian Qiu <[hidden email]>
Send Time:2020 Mar. 28 (Sat.) 11:49
To:Lu Niu <[hidden email]>
Cc:user <[hidden email]>
Subject:Re: End to End Latency Tracking in flink

Hi
As far as I know, the latency-tracking feature is for debugging usages, you can use it to debug, and disable it when running the job on production.
From my side, use $current_processing - $event_time is something ok, but keep the things in mind: the event time may not be the time ingested in Flink.

Best,
Congxian


Lu Niu <[hidden email]> 于2020年3月28日周六 上午6:25写道:
Hi,

I am looking for end to end latency monitoring of link job. Based on my study, I have two options:

1. flink provide a latency tracking feature. However, the documentation says it cannot show actual latency of business logic as it will bypass all operators. https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#latency-tracking Also, the feature can significantly impact the performance so I assume it's not for usage in production. What are users use the latency tracking for? Sounds like only back pressure could affect the latency. 

2. I found another stackoverflow question on this. https://stackoverflow.com/questions/56578919/latency-monitoring-in-flink-application . The answer suggestion to expose (current processing - the event time) after source and before sink for end to end latency monitoring. Is this a good solution? If not, What’s the official solution for end to end latency tracking? 

Thank you! 

Best
Lu


Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

Oscar Westra van Holthe - Kind
On Mon, 30 Mar 2020 at 05:08, Lu Niu <[hidden email]> wrote:
$current_processing - $event_time works for event time. How about processing time? Is there a good way to measure the latency?

To measure latency you'll need some way to determine the time spent between the start and end of your pipeline.

To measure latency when using processing time, you'll need to partially use ingestion time. That is, you'll need to add the 'current' processing time as soon as messages are ingested.

With it, you can then use the $current_processing - $ingest_time solution that was already mentioned.

Kind regards,
Oscar

--
Oscar Westra van Holthe - Kind
Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

张光辉
Hi.
At flink source connector, you can send $source_current_time - $event_time metric.
In the meantime, at flink sink connector, you can send $sink_current_time - $event_time metric.
Then you use  $sink_current_time - $event_time - ($source_current_time - $event_time) = $sink_current_time - $source_current_time as the latency of end to end。

Oscar Westra van Holthe - Kind <[hidden email]> 于2020年3月30日周一 下午5:15写道:
On Mon, 30 Mar 2020 at 05:08, Lu Niu <[hidden email]> wrote:
$current_processing - $event_time works for event time. How about processing time? Is there a good way to measure the latency?

To measure latency you'll need some way to determine the time spent between the start and end of your pipeline.

To measure latency when using processing time, you'll need to partially use ingestion time. That is, you'll need to add the 'current' processing time as soon as messages are ingested.

With it, you can then use the $current_processing - $ingest_time solution that was already mentioned.

Kind regards,
Oscar

--
Oscar Westra van Holthe - Kind
Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

zoudan
Hi,
I think we may add latency metric for each operator, which can reflect consumption ability of each operator.

Best,
Dan Zou


在 2020年3月30日,18:19,Guanghui Zhang <[hidden email]> 写道:

Hi.
At flink source connector, you can send $source_current_time - $event_time metric.
In the meantime, at flink sink connector, you can send $sink_current_time - $event_time metric.
Then you use  $sink_current_time - $event_time - ($source_current_time - $event_time) = $sink_current_time - $source_current_time as the latency of end to end。

Oscar Westra van Holthe - Kind <[hidden email]> 于2020年3月30日周一 下午5:15写道:
On Mon, 30 Mar 2020 at 05:08, Lu Niu <[hidden email]> wrote:
$current_processing - $event_time works for event time. How about processing time? Is there a good way to measure the latency?

To measure latency you'll need some way to determine the time spent between the start and end of your pipeline.

To measure latency when using processing time, you'll need to partially use ingestion time. That is, you'll need to add the 'current' processing time as soon as messages are ingested.

With it, you can then use the $current_processing - $ingest_time solution that was already mentioned.

Kind regards,
Oscar

--
Oscar Westra van Holthe - Kind

Reply | Threaded
Open this post in threaded view
|

Re: End to End Latency Tracking in flink

Lu Niu
An Operator like below will expose lag between current time and event time passing the operator. I add that after the source and before the sink, and calculate sink_delay - source_delay in grafana. would that be a generic solution to solve the problem?
```
public class EmitLagOperator<T> extends AbstractStreamOperator<T>
    implements OneInputStreamOperator<T, T> {

  private transient long delay;

  public EmitLagOperator() {
    chainingStrategy = ChainingStrategy.ALWAYS;
  }

  @Override
  public void processElement(StreamRecord<T> element) throws Exception {
    long now = getProcessingTimeService().getCurrentProcessingTime();
    delay = now - element.getTimestamp();
    output.collect(element);
  }

  @Override
  public void open() throws Exception {
    super.open();
    getRuntimeContext()
        .getMetricGroup()
        .gauge("delay", new Gauge<Long>() {
          @Override
          public Long getValue() {
            return delay;
          }
        });
  }
}
```

On Wed, Apr 1, 2020 at 7:59 PM zoudan <[hidden email]> wrote:
Hi,
I think we may add latency metric for each operator, which can reflect consumption ability of each operator.

Best,
Dan Zou


在 2020年3月30日,18:19,Guanghui Zhang <[hidden email]> 写道:

Hi.
At flink source connector, you can send $source_current_time - $event_time metric.
In the meantime, at flink sink connector, you can send $sink_current_time - $event_time metric.
Then you use  $sink_current_time - $event_time - ($source_current_time - $event_time) = $sink_current_time - $source_current_time as the latency of end to end。

Oscar Westra van Holthe - Kind <[hidden email]> 于2020年3月30日周一 下午5:15写道:
On Mon, 30 Mar 2020 at 05:08, Lu Niu <[hidden email]> wrote:
$current_processing - $event_time works for event time. How about processing time? Is there a good way to measure the latency?

To measure latency you'll need some way to determine the time spent between the start and end of your pipeline.

To measure latency when using processing time, you'll need to partially use ingestion time. That is, you'll need to add the 'current' processing time as soon as messages are ingested.

With it, you can then use the $current_processing - $ingest_time solution that was already mentioned.

Kind regards,
Oscar

--
Oscar Westra van Holthe - Kind