The parallel profiler measures a parallel application's performance by capturing the amount of time the program spends in different code regions or execution activities. This section provides a basic understanding of performance issues that arise when HPF programs are run in under PSE. This knowledge is useful to develop a strategy for effectively applying the parallel profiler.
Four major factors have a direct impact on the performance of HPF applications running in under PSE:
You can use the parallel profiler to analyze program design issues, determine optimal resource selection and identify system parameters to modify, as shown in Figure 10-1.
Refer to Part IV for information on PSE system configuration and its effect on performance.
pprof provides two methods to analyze parallel program execution:
A nonintrusive method that examines the program counter address at regular intervals. The profiler tallies the program counter value at each tick, providing a statistical picture of where the program spends its time. Statistical timing is based on the number of sample counts in each routine and is measured exclusively from the other routine calls.
An intrusive method that performs source-level hierarchical timing. It is hierarchical in the sense that the time is recorded for the execution of a subroutine, including the execution time of other routines called. It collects interval timing (the elapsed time between the start and the end of an event) and count information on the occurrences of events. An event is a construct such as a routine or DO loop occurring at specific locations in the code. Each event is distinguished from other events by the trace pattern originating the event. A trace pattern can include a line number or file name specific to the event.
Interval profiling also provides communication profiles related to array distribution in HPF programs. Time is measured in real time and timing for each event is measured inclusively with other events being called by it. Interval profiling statistics reflect inclusive time - they contain data for lower-level events called from within an event. For example, time spent in a DO loop includes the time spent for routine calls, other DO loops, and so on, occurring within the DO loop.
Time is measured using the Alpha process cycle counter allowing for nanosecond resolutions. If the range of the cycle counter is exceeded for a single event such as a very long input/output (I/O) operation, pprof detects this condition and reports a warning.
Each profiling method provides a default report format. With pprof options, you can customize the reports to focus on particular events, types, or use different data display parameters.
Table 10-1 provides an overview of the different types of profiling data.
| Program Performance | ||
|---|---|---|
| Data Type | Method | Description |
| Program counter | PC sampling | Includes exclusive time spent executing routines, source lines, and instructions. This information is useful in gaining an overview of where the program is statistically spending its time. |
| Program Events | Interval profiling | Includes time executing events such as routines and DO loops. |
| Communications Performance | ||
| Data Type | Method | Description |
| Messaging Layer Communications Statistics | Interval profiling | Describes communication timing and data throughput statistics for events at the PSE message-passing level. This data can be used to determine the performance of routines and peers involved in communications. |
| Socket-Level Communications Statistics | Interval profiling | Describes time spent communicating and the amount of data transferred for events at the network communication socket level. This data can be used to determine the relative performance of different communication protocols. |
| System Performance | ||
| Data Type | Method | Description |
| Process Run-time Statistics | interval profiling and pc sampling | Describes system usage; run-time information includes the number of page faults and swaps. Interval profiling includes total execution time, total overhead time, computation time, and idle time. |