Sort Benchmark infrastructure

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Sort Benchmark infrastructure

hawin

Hi Michael and George

 

First of all, congratulation you guys have won the sort game again.  We are coming from Flink community.  

I am not sure if it is possible to get your test environment to test our Flink for free.  we saw that Apache spark did a good job as well. 

We want to challenge your records. But we don’t have that much servers for testing.

Please let me know if you can help us or not.

Thank you very much.

 

 

 

Best regards

Hawin

Reply | Threaded
Open this post in threaded view
|

Re: Sort Benchmark infrastructure

hawin
Hi  George and Mike

Thanks for your information.  Did you use 186 i2.8xlarge servers for testing?  
Total one hour cost = 186 * 6.82 = $1,268.52.
Do you know any person or company can sponsor this?

For our test approach, I have checked an industry standard from big data bench(http://prof.ict.ac.cn/BigDataBench/industry-standard-benchmarks/)
Maybe we can test TeraSort to see the performance is better than your record or not. 

Please let me know if you have any comments.
Thanks for the support. 




Best regards
Hawin 



On Tue, Jul 14, 2015 at 9:42 AM, Mike Conley <[hidden email]> wrote:
George is correct.  We used i2.8xlarge with placement groups on Amazon EC2.  We ran Amazon Linux, which if I recall correctly is based on Red Hat, but optimized for EC2.  OS was essentially unmodified with some packages installed for our dependencies.

Thanks,
Mike

On Tue, Jul 14, 2015 at 9:15 AM, George Porter <[hidden email]> wrote:
Hello Hawin,

Thanks for reaching out.  We wrote a paper on our efforts, which we'll be posting to our website in a couple of weeks.

However in summary, we used a cluster of i2.8xlarge instance types from Amazon, and we made use of the placement group feature to ensure that we'd get good bandwidth between them.  Mike can correct me if I'm wrong, but I believe we used the stock AWS version of Linux (Ubuntu maybe?)

So our environment was pretty stock--we didn't get any special support or features from AWS.

Best of luck with your profiling and benchmarking.  Do let us know how you perform.  Flink looks like a pretty interesting project, and so let us know if we can help y'all out in some way.

Thanks, George


On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang <[hidden email]> wrote:

Hi Michael and George

 

First of all, congratulation you guys have won the sort game again.  We are coming from Flink community.  

I am not sure if it is possible to get your test environment to test our Flink for free.  we saw that Apache spark did a good job as well. 

We want to challenge your records. But we don’t have that much servers for testing.

Please let me know if you can help us or not.

Thank you very much.

 

 

 

Best regards

Hawin




Reply | Threaded
Open this post in threaded view
|

Re: Sort Benchmark infrastructure

hawin
Hi  George

Thanks for the details.  It looks like I have a long way to go. 
For big data benchmark, I would like to use that test cases, test data and test methodology to test different big data technologies. 
BTW, I am agree with you that no one system will necessarily be optimal for all cases for all workloads.
I hope I can find a good one for our enterprise application.  I will let you know if I can move forward this.
Good Night.



Best regards
Hawin

On Wed, Jul 15, 2015 at 9:30 AM, George Porter <[hidden email]> wrote:
Hi Hawin,

We used varying numbers of the i2.8xlarge servers, depending on the sort record category.  http://sortbenchmark.org/ is really your best source for what we did--all the details (should) be on our write-ups.  Note that we pro-rated the cost, meaning that if we ran for 15 minutes, we took the hourly rate and divided by 4.

In terms of sponsorship, we used a combination of credits donated by Amazon, as well as funding form the National Science Foundation.  You can submit a grant proposal to Amazon and ask them for credits if you're an academic or researcher.  Not sure if being part of an open-source project counts, but you might as well try.

In terms of the sort record, that webpage I provided above has all the details on the challenge.  Not sure about Big Data benchmark--that term is pretty vague.  Often when people say big data, they mean different things.  Our system is designed for lots of bytes, but not really lots of compute over those bytes.  Others pick different design points.  I think you'll find that the needs of different users varies quite a bit, and no one system will necessarily be optimal for all cases for all workloads.

Good luck on your attempts.  
-George

----
George Porter
Assistant Professor, Dept. of Computer Science and Engineering
Associate Director, UCSD Center for Networked Systems
UC San Diego, La Jolla CA
http://www.cs.ucsd.edu/~gmporter/



On Wed, Jul 15, 2015 at 1:44 AM, Hawin Jiang <[hidden email]> wrote:
Hi  George and Mike

Thanks for your information.  Did you use 186 i2.8xlarge servers for testing?  
Total one hour cost = 186 * 6.82 = $1,268.52.
Do you know any person or company can sponsor this?

For our test approach, I have checked an industry standard from big data bench(http://prof.ict.ac.cn/BigDataBench/industry-standard-benchmarks/)
Maybe we can test TeraSort to see the performance is better than your record or not. 

Please let me know if you have any comments.
Thanks for the support. 




Best regards
Hawin 



On Tue, Jul 14, 2015 at 9:42 AM, Mike Conley <[hidden email]> wrote:
George is correct.  We used i2.8xlarge with placement groups on Amazon EC2.  We ran Amazon Linux, which if I recall correctly is based on Red Hat, but optimized for EC2.  OS was essentially unmodified with some packages installed for our dependencies.

Thanks,
Mike

On Tue, Jul 14, 2015 at 9:15 AM, George Porter <[hidden email]> wrote:
Hello Hawin,

Thanks for reaching out.  We wrote a paper on our efforts, which we'll be posting to our website in a couple of weeks.

However in summary, we used a cluster of i2.8xlarge instance types from Amazon, and we made use of the placement group feature to ensure that we'd get good bandwidth between them.  Mike can correct me if I'm wrong, but I believe we used the stock AWS version of Linux (Ubuntu maybe?)

So our environment was pretty stock--we didn't get any special support or features from AWS.

Best of luck with your profiling and benchmarking.  Do let us know how you perform.  Flink looks like a pretty interesting project, and so let us know if we can help y'all out in some way.

Thanks, George


On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang <[hidden email]> wrote:

Hi Michael and George

 

First of all, congratulation you guys have won the sort game again.  We are coming from Flink community.  

I am not sure if it is possible to get your test environment to test our Flink for free.  we saw that Apache spark did a good job as well. 

We want to challenge your records. But we don’t have that much servers for testing.

Please let me know if you can help us or not.

Thank you very much.

 

 

 

Best regards

Hawin