How to measure Flink performance

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to measure Flink performance

prateekarora
Hi

I am new in Apache Flink  and using Flink 1.0.1

I have a streaming program that fetch data from kafka , perform some computation and send result to kafka again.

I am want to compare results between  Flink and Spark  .

I have below information from spark . do i can get similar information from Flink also ? if yes then how can i get that.

 - Scheduler Delay
 - Processing time of every batch
 - Task Deserialization/Serialization Time
 - Shuffle Read Time
 - Executor  Computing Time
 - Shuffle Write Time
 - GC Time

Regards
Prateek




Regards
Prateek

 
Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

Ufuk Celebi
Hey Prateek,

On Fri, May 6, 2016 at 6:40 PM, prateekarora <[hidden email]> wrote:
> I have below information from spark . do i can get similar information from
> Flink also ? if yes then how can i get that.

You can get GC time via the task manager overview.

The other metrics don't necessarily translate to Flink as Flink is not
executing your streaming program as mini-batches, but your program is
executed with continuous (long lived) operators.

This means for example that shuffles are continiously exchanging data
and you can't easily look at "how long the shuffle took". Also, the
scheduler delay and serialization times are not that interesting for
Flink as the cost of this is amortized over one long-running job (e.g.
because zero if your job is running long enough ;)) and you don't
schedule and serialize the tasks multiple times.

– Ufuk
Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

prateekarora
Hi
Thanks for the answer , then how can i measure the performance of flink ?

i want to run my application with both spark and flink . and want to measure the performance . so i can check how fast flink process my data as compare to spark.

Regards
prateek

On Mon, May 9, 2016 at 2:17 AM, Ufuk Celebi <[hidden email]> wrote:
Hey Prateek,

On Fri, May 6, 2016 at 6:40 PM, prateekarora <[hidden email]> wrote:
> I have below information from spark . do i can get similar information from
> Flink also ? if yes then how can i get that.

You can get GC time via the task manager overview.

The other metrics don't necessarily translate to Flink as Flink is not
executing your streaming program as mini-batches, but your program is
executed with continuous (long lived) operators.

This means for example that shuffles are continiously exchanging data
and you can't easily look at "how long the shuffle took". Also, the
scheduler delay and serialization times are not that interesting for
Flink as the cost of this is amortized over one long-running job (e.g.
because zero if your job is running long enough ;)) and you don't
schedule and serialize the tasks multiple times.

– Ufuk

Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

prateekarora
Hi

How can i measure  throughput and latency  of my application in flink 1.0.2 ?

Regards
Prateek
Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

snntr
Hi Prateek,

regarding throughput, what about simply filling the input Kafka topic
with some (a lot) of messages and monitor (e.g.
http://quantifind.github.io/KafkaOffsetMonitor/) how quickly Flink can
work the lag off. The messages should be representative of your use
case, of course.

Latency is harder, I think, and I would also be interested in the
approaches of others to measure latency in Flink.

To some extend, you could do it by adding some logging inside Flink, but
this effects latency and only measure latency whithin Flink (excluding
reading from source and writing to sink).

Cheers,

Konstantin

On 12.05.2016 18:57, prateekarora wrote:

> Hi
>
> How can i measure  throughput and latency  of my application in flink 1.0.2
> ?
>
> Regards
> Prateek
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
>

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082
Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

Dhruv Gohil
Hi Prateek,

https://github.com/dataArtisans/yahoo-streaming-benchmark/blob/master/flink-benchmarks/src/main/java/flink/benchmark/utils/ThroughputLogger.java
https://github.com/dataArtisans/yahoo-streaming-benchmark/blob/master/flink-benchmarks/src/main/java/flink/benchmark/utils/AnalyzeTool.java

help you measure  throughput and latency both, from within topology.

On Thursday 12 May 2016 11:21 PM, Konstantin Knauf wrote:
Hi Prateek,

regarding throughput, what about simply filling the input Kafka topic
with some (a lot) of messages and monitor (e.g.
http://quantifind.github.io/KafkaOffsetMonitor/) how quickly Flink can
work the lag off. The messages should be representative of your use
case, of course.

Latency is harder, I think, and I would also be interested in the
approaches of others to measure latency in Flink.

To some extend, you could do it by adding some logging inside Flink, but
this effects latency and only measure latency whithin Flink (excluding
reading from source and writing to sink).

Cheers,

Konstantin

On 12.05.2016 18:57, prateekarora wrote:
Hi

How can i measure  throughput and latency  of my application in flink 1.0.2
?

Regards
Prateek



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


    

Reply | Threaded
Open this post in threaded view
|

Re: How to measure Flink performance

Ken Krugler
Hi Dhruv,

On May 12, 2016, at 11:07pm, Dhruv Gohil <[hidden email]> wrote:


The AnalyzeTool is processing a file that has lines which match the pattern:

Pattern latencyPattern = Pattern.compile(".*Latency ([0-9]+) ms.*);

This isn’t something created by the ThroughputLogger.

Who generates these files, and how do they calculate the latency?

Thanks,

— Ken



On Thursday 12 May 2016 11:21 PM, Konstantin Knauf wrote:
Hi Prateek,

regarding throughput, what about simply filling the input Kafka topic
with some (a lot) of messages and monitor (e.g.
http://quantifind.github.io/KafkaOffsetMonitor/) how quickly Flink can
work the lag off. The messages should be representative of your use
case, of course.

Latency is harder, I think, and I would also be interested in the
approaches of others to measure latency in Flink.

To some extend, you could do it by adding some logging inside Flink, but
this effects latency and only measure latency whithin Flink (excluding
reading from source and writing to sink).

Cheers,

Konstantin

On 12.05.2016 18:57, prateekarora wrote:
Hi

How can i measure  throughput and latency  of my application in flink 1.0.2
?

Regards
Prateek



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


    


--------------------------
Ken Krugler
+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr