Hi everyone,
I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink. I hope I did a mistake. So please help me to improve the performance of my cluster and config. The cluster has 4 computers: One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp)) Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp)) In total 24 cores/ task slots. I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24. Runtime in ms: Pregel: 90.000ms SG: 64.000ms GSA: 80.000ms Bulk: 53.000ms Spark with Pregel ran in 23.000ms The input file was: https://snap.stanford.edu/data/wiki-topcats.html Thanks for helping! Marc |
Does someone has a current performance test based on PageRank or an idea why Flink lost the comparison?
> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc <[hidden email]>: > > Hi everyone, > > I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink. > I hope I did a mistake. So please help me to improve the performance of my cluster and config. > > The cluster has 4 computers: > One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp)) > Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp)) > In total 24 cores/ task slots. > > I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24. > Runtime in ms: > Pregel: 90.000ms > SG: 64.000ms > GSA: 80.000ms > Bulk: 53.000ms > Spark with Pregel ran in 23.000ms > > The input file was: https://snap.stanford.edu/data/wiki-topcats.html > > Thanks for helping! > > Marc |
You could enable object reuse [0] if you application allows that. Also
adjusting the managed memory size [1] can help. Are you using Flink's graph library Gelly? [0] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#object-reuse-enabled [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#managed-memory Regards, Timo Am 23.08.17 um 17:11 schrieb Kaepke, Marc: > Does someone has a current performance test based on PageRank or an idea why Flink lost the comparison? > > >> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc <[hidden email]>: >> >> Hi everyone, >> >> I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink. >> I hope I did a mistake. So please help me to improve the performance of my cluster and config. >> >> The cluster has 4 computers: >> One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp)) >> Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp)) >> In total 24 cores/ task slots. >> >> I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24. >> Runtime in ms: >> Pregel: 90.000ms >> SG: 64.000ms >> GSA: 80.000ms >> Bulk: 53.000ms >> Spark with Pregel ran in 23.000ms >> >> The input file was: https://snap.stanford.edu/data/wiki-topcats.html >> >> Thanks for helping! >> >> Marc |
Free forum by Nabble | Edit this page |