(DEPRECATED) Apache Flink User Mailing List archive.

Measure End-to-End latency/delay for each record

Classic

List

Threaded

7 messages Options

Dhruv Kumar

Measure End-to-End latency/delay for each record

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:

1. Add the current system timestamp to the record when the record arrives at Flink.

2. Add the current system timestamp to the record when the record is finally being emitted into the sink.

3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------

Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Michael Latta

Re: Measure End-to-End latency/delay for each record

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Dhruv Kumar

Re: Measure End-to-End latency/delay for each record

What do you mean by the time skew from one machine(source) to another(sink)? Do you mean the system time clocks of the source and sink may not be in sync. If I regularly use NTP to keep the system clocks in sync, will time skew still happen?

Could you also elaborate on what do you mean by back pressure on source and how will it impact the latency calculations?

Sorry if these are trivial questions. I am a bit new to the real world streaming systems.

--------------------------------------------------

Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 13:26, TechnoMage <[hidden email]> wrote:

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Michael Latta

Re: Measure End-to-End latency/delay for each record

Yes NTP can still have skew. It may be measured in fractions of a second, but with Flink that can be significant if you care about sub-second latency accuracy. Since I have a 20 stage stream with 0.002 second latency it can matter.

Back pressure is the limiting of input due to the inability of down-stream tasks to accept input. For example if you have a map that reads from a database to enhance an element, that may limit earlier steps performance as they can not push elements to it faster than it can read from the database. This can flow all the way back to the source and slow records coming into the system.

Michael

On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <[hidden email]> wrote:

What do you mean by the time skew from one machine(source) to another(sink)? Do you mean the system time clocks of the source and sink may not be in sync. If I regularly use NTP to keep the system clocks in sync, will time skew still happen?

Could you also elaborate on what do you mean by back pressure on source and how will it impact the latency calculations?

Sorry if these are trivial questions. I am a bit new to the real world streaming systems.

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 13:26, TechnoMage <[hidden email]> wrote:

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Dhruv Kumar

Re: Measure End-to-End latency/delay for each record

Ok that answers my questions.

What are you keeping the source and sink as? Is it Kafka for both?

--------------------------------------------------

Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 16:37, TechnoMage <[hidden email]> wrote:

Yes NTP can still have skew. It may be measured in fractions of a second, but with Flink that can be significant if you care about sub-second latency accuracy. Since I have a 20 stage stream with 0.002 second latency it can matter.

Back pressure is the limiting of input due to the inability of down-stream tasks to accept input. For example if you have a map that reads from a database to enhance an element, that may limit earlier steps performance as they can not push elements to it faster than it can read from the database. This can flow all the way back to the source and slow records coming into the system.

Michael

On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <[hidden email]> wrote:

What do you mean by the time skew from one machine(source) to another(sink)? Do you mean the system time clocks of the source and sink may not be in sync. If I regularly use NTP to keep the system clocks in sync, will time skew still happen?

Could you also elaborate on what do you mean by back pressure on source and how will it impact the latency calculations?

Sorry if these are trivial questions. I am a bit new to the real world streaming systems.

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 13:26, TechnoMage <[hidden email]> wrote:

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Michael Latta

Re: Measure End-to-End latency/delay for each record

Yes, Kafka for source and sink which makes monitoring the Flink in/out easy.

Michael

On Apr 26, 2018, at 5:27 PM, Dhruv Kumar <[hidden email]> wrote:

Ok that answers my questions.

What are you keeping the source and sink as? Is it Kafka for both?

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 16:37, TechnoMage <[hidden email]> wrote:

Yes NTP can still have skew. It may be measured in fractions of a second, but with Flink that can be significant if you care about sub-second latency accuracy. Since I have a 20 stage stream with 0.002 second latency it can matter.

Back pressure is the limiting of input due to the inability of down-stream tasks to accept input. For example if you have a map that reads from a database to enhance an element, that may limit earlier steps performance as they can not push elements to it faster than it can read from the database. This can flow all the way back to the source and slow records coming into the system.

Michael

On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <[hidden email]> wrote:

What do you mean by the time skew from one machine(source) to another(sink)? Do you mean the system time clocks of the source and sink may not be in sync. If I regularly use NTP to keep the system clocks in sync, will time skew still happen?

Could you also elaborate on what do you mean by back pressure on source and how will it impact the latency calculations?

Sorry if these are trivial questions. I am a bit new to the real world streaming systems.

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 13:26, TechnoMage <[hidden email]> wrote:

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Dhruv Kumar

Re: Measure End-to-End latency/delay for each record

Ok thanks Michael for all your help!

--------------------------------------------------

Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 19:24, TechnoMage <[hidden email]> wrote:

Yes, Kafka for source and sink which makes monitoring the Flink in/out easy.

Michael

On Apr 26, 2018, at 5:27 PM, Dhruv Kumar <[hidden email]> wrote:

Ok that answers my questions.

What are you keeping the source and sink as? Is it Kafka for both?

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 16:37, TechnoMage <[hidden email]> wrote:

Yes NTP can still have skew. It may be measured in fractions of a second, but with Flink that can be significant if you care about sub-second latency accuracy. Since I have a 20 stage stream with 0.002 second latency it can matter.

Back pressure is the limiting of input due to the inability of down-stream tasks to accept input. For example if you have a map that reads from a database to enhance an element, that may limit earlier steps performance as they can not push elements to it faster than it can read from the database. This can flow all the way back to the source and slow records coming into the system.

Michael

On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <[hidden email]> wrote:

What do you mean by the time skew from one machine(source) to another(sink)? Do you mean the system time clocks of the source and sink may not be in sync. If I regularly use NTP to keep the system clocks in sync, will time skew still happen?

Could you also elaborate on what do you mean by back pressure on source and how will it impact the latency calculations?

Sorry if these are trivial questions. I am a bit new to the real world streaming systems.

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

On Apr 26, 2018, at 13:26, TechnoMage <[hidden email]> wrote:

In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel read the source and output of the sink to measure the latency on a single system clock. It does account for those issues, but of course does not account for delivery delays in the messaging system (kafka in our case). But, does measure real world latency as seen by the rest of the system which is ultimately what matters to us.

Michael

On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <[hidden email]> wrote:

Hi

I was trying to compute the end-to-end-latency for each record processed by Flink. By end-to-end latency, I mean the difference between the time at which the record entered the Flink system (came at source) and the time at which the record is finally emitted into the sink. What is the best way to measure this? I was thinking of doing the following:
1. Add the current system timestamp to the record when the record arrives at Flink.
2. Add the current system timestamp to the record when the record is finally being emitted into the sink.
3. Take the difference between 2 and 1 offline when all the records have been written into the sink.

Does this sound ok?

Also, if I use Processing time characteristic for this end-to-end-latency, will it be fine?

Thanks

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me