Hi All, This query regarding the flink performance improvement . Flink Configuration: using flink in clustor mode with 3 salves and a master configuration slots used 30 (as the system has 30 core) task manager memory 30GB parallelism used : 30 jobmanager.heap.mb: 20480 taskmanager.heap.mb: 20480 taskmanager.numberOfTaskSlots: 30 taskmanager.network.numberOfBuffers: 20000 Input info: Input file : 1ROP(5min) data with 3333 Nodes and 665K eps Total number of events :: 199498294 Observation : Total time taken to complete the task = 6m24s Can you please suggest what else I need to modify to get the high performance in terms of lese execution time. Thanks in advance Regards,
Samim Ahmed Mumbai 09004259232 |
Hi,
the answer highly depends on what you job is doing and there is no information about that. Also what is your target in performance? Are you using batch or streaming? If you feel like the performance is lower than expected, I suggest that you do some profiling to figure out the hotspots. For example, you could see that your job spends most time in type serialization, which is a common bottleneck. In this case, maybe you can write a faster custom serializer. Or rewriting the job (e.g. use early aggregation where possible etc.) can yield much more performance improvement then tuning magic numbers with no further knowledge about your job. Best, Stefan
|
Some documentation on application profiling with Flink 1.3 (can be manually inserted into the scripts for Flink 1.2):
|
Just a quick remark about memory and number of slots: with your configuration of 30 slots but only ~20gb of RAM each processing slot does not have a lot of memory to work with. For batch programs this can be a problem. I would suggest to use less but bigger slots, even if the number of cores is very high on your machine.
Best, Aljoscha
|
Free forum by Nabble | Edit this page |