Hi everyone,
I'm using Flink 0.10.2 for some benchmarks and had to add some small changes to Flink, which led me to compiling and running it myself. This is when I noticed a performance difference in the pre-packaged Flink version that I downloaded from the web (http://archive.apache.org/dist/flink/flink-0.10.2/flink-0.10.2-bin-hadoop27.tgz) versus the form of the release-0.10 branch I built myself (mvn -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests -Drat.skip=true clean install // mvn version 3.0.4). I ran some version of TeraSort (https://github.com/eastcirclek/terasort) and I noticed that the pre-packaged version of Flink performs 10-20% better than the one I built myself (the only tweaks I mead are in the CliFrontend after the Job has finished running, so I would rule out bad programming on my side). Has anyone come across this before? Or could you provide me with clearer build instructions in order to reproduce the downloadable archive as closely as possible? Thanks in advance! Robert My GPG Key ID: 336E2680 |
Hi Robert, check out the tools/create_release_files.sh file in the source tree. There you can see how we are building the release binaries. It would be quite interesting to find out what caused the performance difference. On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke <[hidden email]> wrote:
|
Hi Robert, thanks for the hint! Looks like something I could have figured out myself -.-" I'll let you know if I find something. Robert On Thu, Apr 14, 2016 at 1:06 PM, Robert Metzger <[hidden email]> wrote:
My GPG Key ID: 336E2680 |
I have tried multiple Maven and Scala Versions, but to no avail. I can't seem to achieve performance of the downloaded archive. I am stumped by this and will need to do more experiments when I have more time. Robert On Thu, Apr 14, 2016 at 1:13 PM, Robert Schmidtke <[hidden email]> wrote:
My GPG Key ID: 336E2680 |
Hi,
Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu
|
You're obviously right, the configs were different. In the downloaded version I had set off heap memory to true, whereas in the version I compiled myself this one-time change to flink-conf.yaml was overwritten by recompiling. I have fixed it now and performance is the same. For the record, I had 30 GiB of TeraGen'd data: -m yarn-cluster \ -yn 10 \ -ys 4 \ -p 40 \ -yjm 3072 \ -ytm 4096 Each of the nodes has 64 GiB of RAM, job ran in 27s, repeatedly. Thanks and sorry for not having checked the obvious ... Robert On Thu, Apr 14, 2016 at 10:23 PM, Ovidiu-Cristian MARCU <[hidden email]> wrote:
My GPG Key ID: 336E2680 |
Free forum by Nabble | Edit this page |