Batch Flink Job S3 write performance vs Spark

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Batch Flink Job S3 write performance vs Spark

sri hari kali charan Tummala
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala

Reply | Threaded
Open this post in threaded view
|

Re: Batch Flink Job S3 write performance vs Spark

Arvid Heise-3
Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its vendor you report that. Then, the competitor benchmarks a different use case where his system outperforms ours. In the end, customers are more confused than before.

You should do your own benchmarks for your own workloads. That is the only reliable way.

In the end, both systems use similar setups and improvements in one system are often also incorporated into the other system with some delay, such that there should be no ground-breaking differences between the two systems running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use case, that could be much faster.


On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <[hidden email]> wrote:
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala

Reply | Threaded
Open this post in threaded view
|

Re: Batch Flink Job S3 write performance vs Spark

sri hari kali charan Tummala
Thank you  (the two systems running on Java and using the same set of libraries), so from my understanding, Flink uses AWS SDK behind the scenes same as spark.

On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise <[hidden email]> wrote:
Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its vendor you report that. Then, the competitor benchmarks a different use case where his system outperforms ours. In the end, customers are more confused than before.

You should do your own benchmarks for your own workloads. That is the only reliable way.

In the end, both systems use similar setups and improvements in one system are often also incorporated into the other system with some delay, such that there should be no ground-breaking differences between the two systems running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use case, that could be much faster.


On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <[hidden email]> wrote:
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala

Reply | Threaded
Open this post in threaded view
|

Re: Batch Flink Job S3 write performance vs Spark

Arvid Heise-3
Exactly. We use the hadoop-fs as an indirection on top of that, but Spark probably does the same.

On Wed, Feb 26, 2020 at 3:52 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you  (the two systems running on Java and using the same set of libraries), so from my understanding, Flink uses AWS SDK behind the scenes same as spark.

On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise <[hidden email]> wrote:
Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its vendor you report that. Then, the competitor benchmarks a different use case where his system outperforms ours. In the end, customers are more confused than before.

You should do your own benchmarks for your own workloads. That is the only reliable way.

In the end, both systems use similar setups and improvements in one system are often also incorporated into the other system with some delay, such that there should be no ground-breaking differences between the two systems running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use case, that could be much faster.


On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <[hidden email]> wrote:
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala

Reply | Threaded
Open this post in threaded view
|

Re: Batch Flink Job S3 write performance vs Spark

sri hari kali charan Tummala
Ok, thanks for the clarification. 

On Wed, Feb 26, 2020 at 9:22 AM Arvid Heise <[hidden email]> wrote:
Exactly. We use the hadoop-fs as an indirection on top of that, but Spark probably does the same.

On Wed, Feb 26, 2020 at 3:52 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you  (the two systems running on Java and using the same set of libraries), so from my understanding, Flink uses AWS SDK behind the scenes same as spark.

On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise <[hidden email]> wrote:
Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its vendor you report that. Then, the competitor benchmarks a different use case where his system outperforms ours. In the end, customers are more confused than before.

You should do your own benchmarks for your own workloads. That is the only reliable way.

In the end, both systems use similar setups and improvements in one system are often also incorporated into the other system with some delay, such that there should be no ground-breaking differences between the two systems running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use case, that could be much faster.


On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <[hidden email]> wrote:
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala

Reply | Threaded
Open this post in threaded view
|

Re: Batch Flink Job S3 write performance vs Spark

sri hari kali charan Tummala
sorry for being lazy I would have gone through flink source code.

On Wed, Feb 26, 2020 at 9:35 AM sri hari kali charan Tummala <[hidden email]> wrote:
Ok, thanks for the clarification. 

On Wed, Feb 26, 2020 at 9:22 AM Arvid Heise <[hidden email]> wrote:
Exactly. We use the hadoop-fs as an indirection on top of that, but Spark probably does the same.

On Wed, Feb 26, 2020 at 3:52 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you  (the two systems running on Java and using the same set of libraries), so from my understanding, Flink uses AWS SDK behind the scenes same as spark.

On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise <[hidden email]> wrote:
Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its vendor you report that. Then, the competitor benchmarks a different use case where his system outperforms ours. In the end, customers are more confused than before.

You should do your own benchmarks for your own workloads. That is the only reliable way.

In the end, both systems use similar setups and improvements in one system are often also incorporated into the other system with some delay, such that there should be no ground-breaking differences between the two systems running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use case, that could be much faster.


On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <[hidden email]> wrote:
Hi All, 

have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? 

--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala