(DEPRECATED) Apache Flink User Mailing List archive.

Key factors for Flink's performance

Classic

List

Threaded

3 messages Options

leon_mclare

Key factors for Flink's performance

Hello Flink team,

i am currently playing around with Storm and Flink in the context of a smart home. The primary functional requirement is to quickly react to certain properties in stream tuples.

I was looking at some benchmarks from the two systems, and generally Flink has the upper hand, in both throughput and latency. I do not really understand how Flink achieves better latency than Storm, which is driven by one-at-at-time tuples.

From what i understood in the documentation, Flink performs micro batching when transferring data across the network to downstream operators located on other nodes. Perhaps this achieves a better average latency.

Surely the bigger factor however is that Flink can completely bypass internal operator queues with operator chaining, which Storm cannot do.

Kind regards

Leon

Aljoscha Krettek

Re: Key factors for Flink's performance

Hi,

latency for Flink and Storm are pretty similar. The only reason I could see for Flink having the slight upper hand there is the fact that Storm tracks the progress of every tuple throughout the topology and requires ACKs that have to go back to the sinks.

As for throughput you are right that Flink sends elements in batches. The size of these batches can be controlled, even be reduced to 1, which yields best latency. The fact that there are these batches not not visible anywhere in the model, so calling them micro batches is problematic, since that already refers to a very different concept in Spark Streaming.

Cheers,

Aljoscha

On Mon, 9 May 2016 at 11:06 <[hidden email]> wrote:

Hello Flink team,

i am currently playing around with Storm and Flink in the context of a smart home. The primary functional requirement is to quickly react to certain properties in stream tuples.

I was looking at some benchmarks from the two systems, and generally Flink has the upper hand, in both throughput and latency. I do not really understand how Flink achieves better latency than Storm, which is driven by one-at-at-time tuples.

From what i understood in the documentation, Flink performs micro batching when transferring data across the network to downstream operators located on other nodes. Perhaps this achieves a better average latency.

Surely the bigger factor however is that Flink can completely bypass internal operator queues with operator chaining, which Storm cannot do.

Kind regards
Leon

Stephan Ewen

Re: Key factors for Flink's performance

Hi Leon!

I agree with Aljoscha that the term "microbatches" is confusing in that context. Flink's network layer is "buffer" oriented rather than "record oriented". Buffering it is a best effort to gather some elements in case where they come fast enough that this would not add much latency anyways.

Concerning the latency: Chaining has a positive effect on latency. Some of the benchmarks show how Flink needs to communicate less with external systems (like Redis) - that is another source of reducing latency.

For very simple programs that have no external communication and no chaining, I would expect Flink and Storm to be not very different in latency.

Greetings,

Stephan

On Wed, May 11, 2016 at 9:24 AM, Aljoscha Krettek <[hidden email]> wrote:

Hi,
latency for Flink and Storm are pretty similar. The only reason I could see for Flink having the slight upper hand there is the fact that Storm tracks the progress of every tuple throughout the topology and requires ACKs that have to go back to the sinks.

As for throughput you are right that Flink sends elements in batches. The size of these batches can be controlled, even be reduced to 1, which yields best latency. The fact that there are these batches not not visible anywhere in the model, so calling them micro batches is problematic, since that already refers to a very different concept in Spark Streaming.

Cheers,
Aljoscha

On Mon, 9 May 2016 at 11:06 <[hidden email]> wrote:

Hello Flink team,

i am currently playing around with Storm and Flink in the context of a smart home. The primary functional requirement is to quickly react to certain properties in stream tuples.

I was looking at some benchmarks from the two systems, and generally Flink has the upper hand, in both throughput and latency. I do not really understand how Flink achieves better latency than Storm, which is driven by one-at-at-time tuples.

From what i understood in the documentation, Flink performs micro batching when transferring data across the network to downstream operators located on other nodes. Perhaps this achieves a better average latency.

Surely the bigger factor however is that Flink can completely bypass internal operator queues with operator chaining, which Storm cannot do.

Kind regards
Leon