Use of Profiling in Code Performance Tuning
Typical profilers take periodic samples in which they determine which subroutine is executing at sample time. Profilers like gprof also count how many times each subroutine is invoked by each calling subroutine. They are indispensable in located the sections of a program where performance might be improved, and in assessing the effects of changes.
Processors with hardware event counters may allow the choice of various counters to trigger sampling. Thus it may be possible to estimate the number of events such as cache or tlb misses occurring in each subroutine. If one "knows" how much time should be spent in each subroutine, timing profiles will show up locations where performance is not up to expectation.
Priority can usefully be placed on clarity of expression and verifiable results throughout the code, but performance actually is enhanced by minimizing the bulkiness of code which is seldom executed, while local speed is important only where measurable time is spent in execution.
Static profiling data are available on a few systems. One of the more
useful types is one which provides the following information about the
code generated for each inner loop: