Profiling using the parallel profiler involves three distinct phases:
Figure 10-2 depicts the profiling stages required for parallel profiling.
To instrument a program for parallel profiling, you must use the -pprof switch and associated options in combination with the -wsf switch when compiling.
Use the following syntax to compile a Digital Fortran 90 program for parallel profiling:
% f90 -wsf{n} [ -non_shared ] -pprof {i | s} -o foo foo.f90 . . .
When compiling, parallel profiling support is included by specifying the -wsf and -pprof switches. -wsf n is used to indicate that the parallel transform phase of the F90 compiler is to be invoked and that the executable should support execution using n processes. -pprof is used to specify the profiling method that should be used.
Choose one of two available parallel profiling methods when you compile a program (if no value is provided with the -pprof switch, a value of s is assumed for pc sampling):
Interval profiling collects count and cumulative timing information for the occurrences of routines, DO loops, FORALL, and array operations (reports contain inclusive values, reflecting data for lower-level events called from within an event). Interval profiling provides communications profiles which are related to array distribution in HPF programs.
PC sampling interrupts the program periodically to record values for the program counter. It provides a statistical picture of where the program spends its time. With pc sampling, you must profile the whole program.
When linking is done as a separate step from compiling, it is
necessary to specify -pprof s only at link time. At
compile time, you can specify -c (compile without
linking) without using the -pprof option.
Only one profiling method can be selected at a time, therefore objects build for different profiling methods cannot be mixed.
By default only user generated objects, not objects in shared libraries, are profiled. If a program is compiled for pc sampling and the -non_shared switch is used, code in any nonshared libraries that are included is profiled also.
Because interval profiling is an intrusive method and requires instrumentation by the compiler, objects and libraries that are not generated by the the Digital Fortran 90 compiler cannot be profiled using the interval method. Objects that have and have not been compiled for interval profiling can be mixed to generate a valid program. To enable interval profiling, the file containing the main program must be compiled for interval profiling.
For pc sampling analysis, the program must not be compiled with the -x switch (in order to preserve the line number information) or stripped with the system strip command.
For more information about -wsf and related Digital Fortran 90 compiler options, refer to Section 8.1.1 and to the Fortran User Manual.
When a program is executing, timing data can be based on real time or on virtual time.
The analysis and reporting of parallel profiling data can be based on either an elapsed or a profiled time base.
To generate profiling data output files, execute an HPF program that has been compiled for parallel profiling. The executable must be run on a system that has PSE installed and configured.
When you run the executable, data output files are generated for each process involved in the computation. By default, files are placed in the directory from which you run the executable. By using the PPROF_DIR environment variable, you can enable or disable profiling at execution time. Refer to Table 10-2 for another view of this.
| PPROF_DIR is: | Result: |
|---|---|
| Not defined | Output is written to the current directory.(default) |
| Named directory | Output is written to the named directory. |
| Defined without a value | Profiling is disabled |
The data file naming convention is:
[program]_[method]_[peer#].out
Where:
For example, to perform interval profiling for the program spike running on two processors, compile the program with the following options:
% f90 -wsf 2 -pprof i -o spike spike.f90
Run spike on a system using PSE to obtain the data output files spike_i_0.out and spike_i_1.out.
After you have generated parallel profiler data output files by compiling and executing the program, use pprof to create reports from the data output files.
When you activate pprof, it searches for data output files corresponding to the program name specified on the command line. By default, pprof searches for the data files in the current directory. However, you can specify an alternative directory with the PPROF_DIR environment variable.
For pc sampling analysis, an intermediate disassembly file ([program]_0.dis) is generated. This may be a very large file containing the instruction information of the program. This file remains on the your disk in order to speed processing for other analysis at a later time. You can delete this file if it consumes too much disk space.
You can run pprof on any Alpha system that has the PSE Parallel Programming Environment subset installed.
To run pprof , use the following command syntax:
% pprof [-method interval|sampling] [analysis_opts]\ [control_opts] program_name
pprof output is reported to standard out. It can then be redirected to a file.
Options indicated in the pprof command syntax are defined in the following sections.
The -method option designates the profiling method for which the program was compiled, interval or sampling.
Analysis options determine which reports are generated by the parallel profiler, and the types of information presented in the reports.
The default analysis option for both pc sampling and interval profiling is the combination of -routines with -statistics average.
Available analysis options include the following:
Used with interval profiling, this option displays hierarchical call traces, including total time spent by events in each call path. Traced events are instrumented by the Digital Fortran 90 compiler, including loops, routines, FORALL statements, array assignments and array communications information. By default, call traces are displayed with the lowest-level call listed first (backward), but you can reverse this order by using the forward argument. The line number shown in the report is the location in the file where the last event happened in the traced path. A line shown with a row of underscores (_______) is used to indicate the start of a new trace path. The events are listed with sorted line numbers in ascending order for the event that originates the trace.
For example:
-call_trace forward displays /TOP/x/y/z.DO_LOOP and -call_trace backward displays DO_LOOP.z\y\x\TOP.
Where TOP, x, y, and z are routines and DO LOOP is a DO loop in the routine z.
Used with interval profiling data, this option displays a list of statements in the source code involved with communications sorted by the time, relevant to the selected timebase (either the total elapsed time or the total profiled time).
Used with interval profiling data, this option displays overall communications patterns and statistics between all peers. The information includes communications time, idle time, size, and counts of messaging or socket operations. The default is message analysis only.
Used with interval profiling data, this option displays a list of variables involving communications, sorted by percentage of time (active and idle) relative to the selected time base. The distributed variables are shown in a source and target communications pairs format.
This option provides help on selected options. For example:
% pprof -help -method interval)
Used with pc sampling data, this option displays instruction statistics sorted by percentage of sample count in decreasing order.
Used with interval profiling data, this option displays a summary of interval profiling events sorted by time relative to the selected timebase. Events can be selected with the -event_list option. If an event of the same type is reported multiple times at the same location in the source code, this indicates that the events have occurred in different traced paths. A full listing of event tracing (generated by the -call_trace option) reveals how these events occurred.
Used in both interval profiling and pc sampling, this option displays a summary of process activities per peer. The displayed information includes system time, user time, memory usage, and file operation statistics. For the interval method, communications statistics such as the number of messages and the size of each message type are also included.
Used in both interval profiling and pc sampling, this option displays routine or procedure min/max/skew statistics sorted in decreasing order by percentage of time (interval profiling) or sample count (pc sampling). This is the default analysis option for both pc sampling and interval profiling.
If no analysis option is specified, the -routines option reports a min/max/skew analysis with respect to average time spent for each routine call in the program for all peers.
Used with pc sampling data, this option displays source lines statistics sorted by percentage of sample count in decreasing order. A line can contain multiple statements and a statement can span more than one line.
Used with both interval profiling and pc sampling, this option generates all analysis reports appropriate to the selected profiling method.
Communication analysis is provided only through interval profiling. It understands all of the messaging protocols currently supported by under PSE.
Control options control both data input and data output presentation. Options include the following:
Used with interval profiling data, this option specifies a call depth value of n to be used in the display. The default is 5. If the real call depth is m and is greater than n, the call trace display is prefixed by a total of (m-n) . characters. For example, using the default value of 5, the call u/v/w/x/y/z.DO_LOOP is displayed as ./v/w/x/y/z.DO_LOOP.
This option only applies to the -call_trace option.
Used with interval profiling data, this option restricts the analysis to the specified events. The -event_list option applies only to the -interval_summary and -call_trace options. The default value is hpf. A full listing of event types is found in Table 10-6. Event types are:
Used with interval profiling data and pc sampling data, this option restricts analysis to the specified source files.
Used with interval profiling data and pc sampling data, this option suppresses or enables display of file path information in reports containing file information. The default is nofile_path (displays only the file name).
Used with interval profiling data and pc sampling data, this option truncates the display listing in a report:
The -item #cum% option does not apply to interval profiling because interval measurement is already cumulative.
For example, -item 50% displays half of the total items.
Used with interval profiling data and pc sampling data, this option provides a richer analysis report in the long listing format (exceeding 80 characters, if necessary).
Used with both interval profiling and pc sampling, this option specifies a list of peers to be analyzed. If -peer_ids is not specified or no value is specified with -peer_ids, it defaults to all peers.
Example: -peer_ids 1-3,5 shows reports for peer 1, 2, 3 and 5.
Used with both interval profiling and pc sampling, this option restricts analysis to the specified routines.
Used with both interval profiling and pc sampling. This option affects how the calculation of skew from minimum and maximum and whether the average or per peer execution time is reported in pprof reports.
If average is specified, pprof calculates the average time taken to execute a given item by summing the times taken on each peer and dividing by the number of peers. This value is subtracted from the minimum and the maximum time recorded for the event across all peers, to calculate and report skew from the average.
If per_peer is specified, pprof uses the time taken to execute a given item on each peer and calculates and reports the skew from the minimum and maximum time recorded.
The -process_summary option also provides min/max/skew values for runtime statistics.
The per_peer method is the default if any analysis option is selected. For default analysis, the average method is used.
The -statistics average option does not apply to the -comm_peers option.
Used with interval profiling data, this option offers two values to be used as a basis for the time percentage calculation:
Used in both interval profiling and pc sampling, this option selects the time unit to be used in the report. The default unit is second.
Equivalent environment variable: PPROF_TIME_UNIT.
Used with both interval profiling and pc sampling, this option determines whether routines, files, and array names are truncated in reports.
The name of the executable program used to generate the raw data files.
By default, pprof writes the reports to standard out.
By default, pprof uses the following control options:
If no analysis option is specified, pprof defaults to -routines and -statistics average options.
This section provides hints and insight for data scoping in your profiling analysis reports.
Many control options can be used in conjunction with the profiling analysis options to tailor the contents of the report. These control options primarily serve two purposes:
For example:
The options are -truncate, -nofile_path, -notruncate, -file_path, -long
For example, if you want to list all information without truncation, use these options together: -long, -notruncate, - file_path, -item all
Valid pprof analysis options for each profiling method are listed in Table 10-3.
| Analysis Option | PC sampling | Interval profiling |
|---|---|---|
| -all | Y | Y |
| -call_ trace | N | Y |
| - comm_statements | N | Y |
| -comm_peers | N | Y |
| -comm_variables | N | Y |
| -help | Y | Y |
| -instructions | Y | N |
| -interval_ summary | N | Y |
| -process_summary | Y | Y |
| -routines | Y | Y |
| -statements | Y | N |
Valid pprof control options for each profiling method are listed in Table 10-4
| Control Option | PC sampling | Interval profiling |
|---|---|---|
| -call_ depth | N | Y |
| -event_list | N | Y |
| -file_list | Y | Y |
| -[no]file_path | Y | Y |
| -item | Y | Y |
| -long | Y | Y |
| -peer_ids | Y | Y |
| -routine_ list | Y | Y |
| - statistics | Y | Y |
| -time_base | N | Y |
| -time_unit | Y | Y |
| -[no]truncate | Y | Y |
The following PSE environment variables are used exclusively with the PSE profiler; pprof.
The profiling process has three phases:
Profiler Environment variables are used to set parameters applied during the data output or analysis phases. The following tables describe these variables:
PPROF_CALL_FORWARD_SYMBOL sets a symbol indicating that call traces are displayed in a forward direction.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 0 or 1 | A single character | slash (/) | Analysis Phase and interval profiling only |
PPROF_CALL_BACKWARD_SYMBOL sets a symbol indicating that call traces are displayed in a backward direction.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 0 or 1 | A single character | backslash (\) | Analysis Phase and interval profiling only |
PPROF_DIR sets a profiling directory for raw data output files to be deposited in. This affects both the location where an instrumented executable file places pprof data files, and the location at which the pprof utility searches for raw data output.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 0 or 1 | A valid directory | The directory from which the instrumented executable is run | Output & Analysis Phases and interval profiling and pc sampling |
PPROF_SPACE_FILLER sets a symbol to indicate that no information is available for a given field.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 1 | Six characters or less | two dashes (--) | Analysis Phase and interval profiling only |
PPROF_STACKSIZE sets a limit to the stack memory allocated for the storage of pprof data.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 1 | n (integer) | 5000 call signature traces | Output & Analysis Phases and interval profiling only |
PPROF_TIME_UNIT sets the reporting time unit.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 1 | second or msecond or usecond | second | Analysis Phase and interval profiling & pc sampling only |
PPROF_TIMER selects an interval timer type.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 1 | real_time or virtual_time | real_time | Output Phase and interval profiling only |
Timer types are defined as follows:
PPROF_HOSTINFO enables or disables peer information annotation for pprof informational and error messages.
| Number of Values N | Value | Default Value | Applies To: |
|---|---|---|---|
| 1 | ALL or PEER0 or PEER_ID or PEER_NAME or ALL | NONE | Output Phase and interval profiling & pc sampling only |
Analysis reports can be produced in either per peer or average format. One report is generated for each specific analysis option. pprof output is directed to standard out.
Each pprof report contains three sections:
The display format of the analyzed data is predefined for each analysis option. Unless otherwise noted, the percentage of time shown in the report indicates cumulative time. For data reported as an integer value (if average sample count) the value is truncated to the least significant digit.
The names of the data types that can appear in the header section are as follows:
| Data Type | Description |
|---|---|
| bytes | Number of bytes for an array operation |
| cfile | File in which a called routine or event is located |
| cline# | Source line in calling file at which the routine or event is called |
| %comm | Percentage of communications active time |
| comm | Communications active time |
| comm_ type | Send, receive or broadcast |
| count | Number of event occurrences |
| event | Event type |
| file | File name |
| %idle | Percentage of communications idle time |
| idle | Communications idle time |
| instruction | Alpha instruction statement |
| iterations | Number of iterations for a DO loop; if value is zero, it indicates that interval profiling was not able to gather information-most likely due to branching out to DO loops before completion |
| line# | Source line number |
| line_ range | Range of lines covered for this routine (pc sampling) |
| local_peer | Peer sourcing send and broadcast operations, peer sinking receive operations |
| PC | Program counter value |
| peer | Peer id |
| recv_var | Variable performing receive communications; for example, in the assignment statement A[i] = B[j], A is a receive_var |
| remote_peer | Peer sinking send and broadcast operations, peer sourcing receive operations |
| %sample | Percentage of total cumulative (or average) sample counts |
| send_var | Variable performing send communications; for example, in the
assignment statement A[i] = B[j] , B
is a send_var |
| source_line | Source line statement |
| time | Time |
| %skew | Percentage of skew |
| %time | Percentage of time |
Event types displayed in pprof reports can be any of the following:
| Event Type | Description |
|---|---|
| all | All of the recorded event types (default) |
| array_assignment | Array assignment statement |
| broadcast | Broadcast communications |
| comm_type | Communications type |
| do_loop | HPF DO loop |
| FORALL | HPF FORALL statement |
| message_comm | All message-level communications types |
| receive | Receive communications |
| routine | Routine call |
| send | Send communications |
| socket_comm | All socket-level communications types (TCP and UDP only) |
| socket_ receive | Socket-level receive operation (TCP and UDP only) |
| socket_send | Socket- level send operation (TCP and UDP only) |
The following examples are based on the LU decomposition example found in /usr/examples/hpf. They illustrate default reports for pc sampling and interval profiling generated by the pprof profile analysis tool. Each example includes a short description of the intended purpose of the analysis.
All examples in this section use an arrays size of 100.
The following is an example of an interval profiling analysis report.
PPROF Info: pprof -method interval lu_i
PPROF Info: No analysis option was specified - perform default analysis.
PPROF Info: interval profiling analysis is assumed for all peers.
***************************** pprof (version 1.10) *****************************
** **
** interval profiling, -routines, -statistics average **
** -item 20, -truncate, -time_base elapsed **
** -event_list hpf **
** Average of inclusive timing analysis in routine calls for all 2 peers **
** **
********************************************************************************
** Recursive call symbol: * **
** Program `lu_i' was built on: Wed Apr 30 15:17:50 1996 **
** profiled with the default protocol on: Wed Apr 30 15:23:02 1996 **
** Profiling timer type: real_time in seconds **
** Average profiling time spent: 0.90 seconds **
** Total number of profiled routine entries: 3 **
** Average elapsed real time: 1.01 seconds **
** Report generated on: Wed Apr 30 15:28:24 1996 **
********************************************************************************
average time minimum time maximum time routine
(seconds %) (seconds %skew peer) (seconds %skew peer)
--------------------------------------------------------------------------------
0.82 81.5 0.81 -1.7 1 0.84 1.7 0 TYPICAL_USER_PRO
0.66 65.2 0.63 -4.1 1 0.68 4.1 0 LU_DECOMPOSITION
0.16 16.3 0.15 -8.0 0 0.18 8.0 1 GET_A
Average statistics show overall program and run-time performance among peers. The following is an example of an average statistics report.
PPROF Info: pprof -method interval -process_summary -statistics average lu_i
PPROF Info: interval profiling analysis is assumed for all peers.
***************************** pprof (version 1.10) *****************************
** **
** interval profiling, -process_summary, -statistics average **
** -item 20, -truncate, -time_base elapsed **
** -event_list hpf **
** Average of system's run-time statistics for all 2 peers **
** **
********************************************************************************
** Program `lu_i' was built on: Wed Apr 30 15:17:50 1996 **
** Profiled with the default protocol on: Wed Apr 30 15:23:02 1996 **
** Profiling timer type: real_time in seconds **
** Average elapsed real time: 1.01 seconds **
** Report generated on: Wed Apr 30 15:30:43 1996 **
********************************************************************************
Members used:
peer 0 = tactic (Digital UNIX V3.0-347)
number of CPU: 1 (133.5 MHz)
physical memory: 98304 Kilobytes
system page size: 8192 bytes
maximum process memory limit: 92807168 bytes
peer 1 = array (Digital UNIX V3.0-347)
number of CPU: 1 (133.4 MHz)
physical memory: 98304 Kilobytes
system page size: 8192 bytes
maximum process memory limit: 92807168 bytes
run-time statistics average minimum maximum
( value ) (value %skew peer) (value %skew peer)
--------------------------------------------------------------------------------
Timing Info. (sec)
elapsed time 1.01 0.99 -2.0 1 1.03 2.0 0
profiling time 0.90 0.84 -6.5 1 0.96 6.5 0
compute time 0.76 0.76 -0.8 1 0.77 0.8 0
comm. time 0.14 0.09 -38.0 1 0.19 38.0 0
active time 0.10 0.05 -50.2 1 0.15 50.2 0
idle time 0.04 0.04 -7.2 1 0.04 7.2 0
overhead time 0.03 0.02 -33.4 0 0.04 33.4 1
user time 0.72 0.71 -0.2 1 0.72 0.2 0
system time 0.10 0.09 -12.6 0 0.11 12.6 1
Messaging Info.
send time sec 0.00 0.00 0.0 0 0.00 0.0 0
receive time sec 0.12 0.07 -42.6 1 0.18 42.6 0
broadcast time sec 0.01 0.01 -1.7 0 0.01 1.7 1
# total msgs 110 109 -0.9 1 112 1.8 0
# send msgs 0 0 0.0 0 0 0.0 0
# receive msgs 56 55 -1.8 1 57 1.8 0
# broadcast msgs 54 54 0.0 1 55 1.9 0
# message bytes 119600 119600 0.0 0 119600 0.0 0
# send bytes 0 0 0.0 0 0 0.0 0
# receive bytes 59800 59600 -0.3 0 60000 0.3 1
# bcast bytes 59800 59600 -0.3 1 60000 0.3 0
bandwidth bytes/sec 864883 626725 -27.5 0 1394979 61.3 1
Virtual Memory
# swaps 0 0 0.0 0 0 0.0 0
# page faults 210 205 -2.4 0 215 2.4 1
# page reclaims 8 2 -75.0 1 14 75.0 0
shared data bytes 45 45 0.0 0 46 2.2 1
unshared data bytes 652 633 -2.9 0 671 2.9 1
unshared stack bytes 68 68 0.0 0 69 1.5 1
Input/Output
# input operations 12 2 -83.3 0 23 91.7 1
# output operations 1 0 0.0 0 3 200.0 1
Context Switching
# voluntary 78 73 -6.4 0 84 7.7 1
# involuntary 115 111 -3.5 0 119 3.5 1
Misc. Profiled Info.
# profiled events 543 542 -0.2 1 544 0.2 0
# routine entries 3 3 50.0 0 3 50.0 0
# calls 3 3 0.0 0 3 0.0 0
# DO loops 1 1 0.0 0 1 0.0 0
# iterations 99 99 0.0 0 99 0.0 0
# FORALLs 1 1 0.0 0 1 0.0 0
# executions 99 99 0.0 0 99 0.0 0
# array assign. 2 2 0.0 0 2 0.0 0
# executions 100 100 0.0 0 100 0.0 0
# msg. comms 110 109 -0.9 1 112 1.8 0
This is the report providing a summary of all the profiled events in Peer 1.
PPROF Info: pprof -method interval -interval_summary -event_list do_loop,
routine,FORALL,array_assignment -time_unit msecond -peer_ids 1 lu_i
***************************** pprof (version 1.10) *****************************
** **
** interval profiling, -interval_summary, -statistics per_peer **
** -item 20, -truncate, -time_base elapsed **
** -event_list do_loop,routine,FORALL,array_assignment **
** Interval event summary for peer 1 (array) **
** **
********************************************************************************
** Time and count values are measured as: inclusive values **
** Recursive call symbol: * **
** Program `lu_i' was built on: Wed Apr 30 15:17:50 1996 **
** Profiled with the default protocol on: Wed Apr 30 15:23:02 1996 **
** Profiling timer type: real_time in mseconds **
** Profiling time spent: 841 mseconds **
** Total elapsed real time: 988 mseconds **
** Report generated on: Wed Apr 30 15:33:08 1996 **
********************************************************************************
time count event routine cline# cfile
(mseconds %)
---------------------------------------------------------------------
809 81.8 1 ROUTINE TYPICAL_USER_PRO 28 lu_n.f90
631 63.8 1 ROUTINE LU_DECOMPOSITION 42 lu_n.f90
631 63.8 1 DO_LOOP LU_DECOMPOSITION 10 lu_n.f90
618 62.5 99 FORALL LU_DECOMPOSITION 12 lu_n.f90
177 18.0 1 ROUTINE GET_A 38 lu_n.f90
176 17.8 1 ARRAY_AS GET_A 60 lu_n.f90
11 1.2 99 ARRAY_AS LU_DECOMPOSITION 11 lu_n.f90
The following is an example of a long report for routine analysis using PC sampling on Peer 0 and showing all the sampled routines.
PPROF Info: pprof -method sampling -routines -long -peer_ids 0 -item all lu_s
***************************** pprof (version 1.10) *****************************
** **
** pc sampling, -routines, -statistics per_peer **
** -item all, -truncate, -long **
** Exclusive routine analysis for peer 0 (tactic) **
** **
********************************************************************************
** Data is sorted by routines in decreasing of: sample count **
** Program `lu_s' was built on: Wed Apr 30 16:16:16 1996 **
** Profiled with the default protocol on: Wed Apr 30 16:17:23 1996 **
** Profiling system clock: 1024 ticks-per-second **
** Total number of samples per peer: 660 **
** Total number of routines in the program: 1955 **
** Total number of sampled routines: 20 **
** Total elapsed real time: 1.90 seconds **
** Report generated on: Wed Apr 30 17:09:36 1996 **
********************************************************************************
peer 0 minimum maximum routine
line_range file
(count %) (count %skew peer) (count %skew peer)
------------------------------------------------------------------------------------------------
488 73.9 483 -1.0 1 488 0.0 0 lu_decomposition 1 - 13 lu_n.f90
88 13.3 88 0.0 0 89 1.1 1 GET_A 53 - 56 lu_n.f90
63 9.5 63 0.0 0 79 25.4 1 _OtsDivide 216 - 1203 ots_div_alpha.s
7 1.1 7 0.0 0 9 28.6 1 _reshape_recv_ 1292 - 1489 com_reshape.c
2 0.3 0 0.0 1 2 0.0 0 memcpy 53 - 273 memcpy.s
2 0.3 1 -50.0 1 2 0.0 0 __valloc 802 - 1697 malloc.c
2 0.3 2 0.0 0 3 50.0 1 _move_serial_to_ 3496 - 3568 com_reshape.c
2 0.3 0 0.0 1 2 0.0 0 __mallinfo 1269 - 1907 malloc.c
1 0.2 0 0.0 1 1 0.0 0 __read 41 - 112 read.s
1 0.2 0 0.0 1 1 0.0 0 cartesian_noshri 1697 - 1697 malloc.c
1 0.2 1 0.0 0 1 0.0 0 TCP_MsgReadParse 447 - 579 msgcomrecv.c
1 0.2 0 0.0 1 1 0.0 0 TCP_MsgLookup 175 - 203 msgcomrecv.c
1 0.2 0 0.0 1 1 0.0 0 _TCP_SendBroadca 167 - 195 msgbcast.c
1 0.2 0 0.0 1 1 0.0 0 _TCP_GetBuf 204 - 280 msgbufftns.c
1 0.2 0 0.0 1 1 0.0 0 cartesian_alloc 1444 - 1628 malloc.c
1 0.2 0 0.0 1 1 0.0 0 __bcopy 48 - 389 bcopy.s
1 0.2 0 0.0 1 1 0.0 0 recv_a_msg 1137 - 1245 com_reshape.c
1 0.2 0 0.0 1 1 0.0 0 __mallopt 1172 - 1639 malloc.c
1 0.2 0 0.0 1 1 0.0 0 typical_user_pro 28 - 51 lu_n.f90
The following is an example of a report for statement source line analysis on Peer 0 and showing the first 60% of the most sampled lines.
PPROF Info: pprof -method sampling -statements -peer_ids 0 -item 60% lu_s
***************************** pprof (version 1.10) *****************************
** **
** pc sampling, -statements, -statistics per_peer **
** -item 60%, -truncate **
** Statement analysis for peer 0 (tactic) **
** **
********************************************************************************
** Data is sorted by lines in decreasing of: sample count **
** Program `lu_s' was built on: Wed Apr 30 16:16:16 1996 **
** Profiled with the default protocol on: Wed Apr 30 16:17:23 1996 **
** Profiling system clock: 1024 ticks-per-second **
** Total number of samples per peer: 660 **
** Total number of source lines in the program: 154415 **
** Total number of executable lines: 33193 **
** Total number of sampled lines: 30 **
** Total elapsed real time: 1.90 seconds **
** Report generated on: Wed Apr 30 17:24:04 1996 **
********************************************************************************
sample routine line# file
(count %)
---------------------------------------------------------
480 72.7 lu_decomposition 13 lu_n.f90
88 13.3 GET_A 56 lu_n.f90
25 3.8 _OtsDivide 463 ots_div_alpha.s
11 1.7 _OtsDivide 471 ots_div_alpha.s
8 1.2 _OtsDivide 466 ots_div_alpha.s
8 1.2 lu_decomposition 11 lu_n.f90
8 1.2 _OtsDivide 476 ots_div_alpha.s
4 0.6 _OtsDivide 465 ots_div_alpha.s
3 0.5 _OtsDivide 473 ots_div_alpha.s
2 0.3 _OtsDivide 474 ots_div_alpha.s
2 0.3 _reshape_recv_ 1438 com_reshape.c
2 0.3 _reshape_recv_ 1432 com_reshape.c
2 0.3 _OtsDivide 475 ots_div_alpha.s
1 0.2 cartesian_alloc 1444 malloc.c
1 0.2 TCP_MsgReadParse 576 msgcomrecv.c
1 0.2 cartesian_noshri 1697 malloc.c
1 0.2 TCP_MsgLookup 203 msgcomrecv.c
1 0.2 _TCP_SendBroadca 189 msgbcast.c