Author: Roth, Mark
Over the past 10 years we have seen the transition from single core computer to multicore computing, with high end consumer computers advertising marketing up to 12 cores. However, taking advantage of these cores is non-trivial. Simply using twice as many cores does not immediately generate twice the performance. Yet performance debugging of parallel programs can be extremely difficult. Our experience in tuning parallel applications led us to discover that performance tuning can be considerably simplified, and even to some degree automated, if profiling measurements are organized according to several intuitive performance factors common to most parallel programs. In this work we present these factors and propose a hierarchical framework composing them. We present various case studies where analyzing profiling data according to the proposed principle led us to improve performance of parallel programs by significant factors (up to 20x). This work lays foundation for new ways of organizing and visualizing profiling data in performance tuning tools.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Fedorova, Alexandra
Member of collection