Hello, Pipelining Question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Hello, Pipelining Question

Philip Lee
Hi,

I found some interesting results from comparison with spark-sql and flink. just for your information, spark-sql uses Hive QL on spark machine.


so as far as we know, when we run Flink job, the functions could be overlapped on pipelining like this picture.

Inline image 1

likewise, spark supports pipelining as I read PPT of Spark. The function could be overlapped as well. but it seems like there is some boundary. 

For example, in Flink, functions to read multiple inputs could be run together with join function like the above pic. but in Spark, to read multiple inputs can be together, but join function is seemingly sepearted to the reading functions. (you can see the starting time and duration, indicating join step is seperated)

Inline image 2

This is why Spark is a Batch processing in memory, wherease Flink is a Streaming processing in memory?

Best,
Phil



Reply | Threaded
Open this post in threaded view
|

Re: Hello, Pipelining Question

Fabian Hueske-2
Yes, Flink is a pipelined system because it is able to shipped data over the network while it is produced (pipelined network communication).
In constrast, Spark produces a result completely before it it is sent over the network in a batch fashion.

However, Flink does also support batched data exchange similar to Spark.

Best, Fabian

2016-02-15 23:17 GMT+01:00 Philip Lee <[hidden email]>:
Hi,

I found some interesting results from comparison with spark-sql and flink. just for your information, spark-sql uses Hive QL on spark machine.


so as far as we know, when we run Flink job, the functions could be overlapped on pipelining like this picture.

Inline image 1

likewise, spark supports pipelining as I read PPT of Spark. The function could be overlapped as well. but it seems like there is some boundary. 

For example, in Flink, functions to read multiple inputs could be run together with join function like the above pic. but in Spark, to read multiple inputs can be together, but join function is seemingly sepearted to the reading functions. (you can see the starting time and duration, indicating join step is seperated)

Inline image 2

This is why Spark is a Batch processing in memory, wherease Flink is a Streaming processing in memory?

Best,
Phil