The following sections describe the processing that occurs when a parallel HPF application runs under PSE. The following topics are discussed:
Parallel applications execute in Single Program, Multiple Data
(SPMD) mode using the PSE environment and software. Each instance
of a parallel program executes roughly the same program statement
on a different data set at the same time. To execute an application,
issue a command that specifies the name of the program (for example:
my_program ) to be run and any PSE command-line options
that are necessary.
The syntax for the command line is as follows:
% my_program [-pse_option value . . . ]
Command-line options override selections set by PSE environment
variables. In all cases, selections must not conflict with the
PSE cluster definitions set by the PSE system administrator. For
example, execution priority cannot be set higher than the value of
PSE_PRIORITY_LIMIT .
Several other requirements must be met for an application to execute successfully:
-on command-line
option to palce Peer 0 onto a processor having more available
memory or onto a system that has local file access, can have a
beneficial performance impact.
The following sections describe the stages of parallel program startup, execution, and termination.
This section describes parallel program startup.
The execution of an application creates a single instance of the program on the local machine, whether that machine is a PSE cluster member or not. This program instance is known as the controlling process. It does not participate in the actual execution of the application other than to provide control and I/O services. The controlling process:
io_manager on remote
hosts is accomplished using the UNIX rsh(1) facility. You
should, therefore, verify that you have remote shell access to
all PSE cluster members of interest. This can be accomplished
using the lspart -verify command. To assure
uniform rsh access, it is recommended that you
add all PSE cluster hosts into your .rhosts file.
(See rhosts(4).) For example, the following PSE cluster host
information could be added to the .rhosts file:
myalpha.xyz.edu username
io_manager process on the
selected PSE cluster members.
lspart , see the lspart(1) man page.
rhosts , see the rhosts(4) man page.
Once communication and a working environment is established
between the application controlling process and the io_
manager , the io_manager s start the required
number of peer processes. Once peer processes are started, each
io_manager works on their behalf to manage standard
I/O. It is also responsible for propagating signals between the
controlling process and the local peers. As application peers exit,
the io_manager reports their exit status back to the
controlling process for final processing.
The io_manager s have an important role in job slot
management. They are responsible for reporting application startups
and exits to the local farmd daemon daemon to decrement
/increment (respectively) the number of available job slots. An
io_manager can also make requests of the farmd
daemon daemon for additional services, such as requiring
the root privileges of farmd to raise (or lower) the
application execution priority.
The successful execution of any application is dependent upon being run within a well known context. This context is referred to as the user environment and consists primarily of a:
You generally set resource limits or environment variables at login
time. These can be modified at any time after the initial login. To
maintain this environment, the default PSE behavior is to propagate
the above context to each application peer at startup time. This
behavior can be overridden by using the -login
command-line option to allow the user environment to be set by their
shell initialization file, for example, .cshrc .
Peer order is based by default on machine load averages. The default sequence is least loaded (Peer 0) to a greater or equal load. The first peer placed is termed "Peer 0". All other peers are numbered sequentially from that point up to n-1, where n is the number of selected peers.
While each of the peers performs the parallel execution of the application, Peer 0 also has the following additional responsibilities:
User specified peer ordering using the -on command
line option is an alternative to the default load-balanced peer
selection. The -on command-line option assigns
sequential peer numbers to the identified list of hosts. All
identified hosts must be members of the specified partition.
PSE, by default, attempts to assign at most a single application peer to each CPU (Central Processing Unit) in the partition. Each CPU in a system is counted separately. This mode of assigning peer processes is referred to as physical mode peer selection.
When you specify virtual-mode peer selection (using the -
virtual command-line option or the PSE_MACHINE
environment variable), PSE assigns more than one peer to a CPU when
necessary.
The actual number of peer processes per machine is balanced according to the number of processes requested and the number of available CPUs. Peers are assigned sequentially, as follows:
For example, an application running in a partition consisting of a two-processor SMP computer and a single-processor workstation may request 7 peers. If the SMP computer has a lighter load than the workstation, the SMP computer will be assigned peers 0, 1, 3, 4, and 6. The workstation will be assigned peers 2 and 5.
The controlling process is connected to each peer by an io_
manager . All terminal handling operations are assigned to
Peer 0. Under normal conditions, PSE propagates terminal input from
the controlling process only to HPF Peer 0. When running a -
debug session however, input (usually debugger commands) is
broadcast to all peers for processing.
Standard I/O handling under PSE differs in two areas from the UNIX standard:
stderr output stream currently shares a
file descriptor with stdout . This can result in
unexpected differences in the output of scripts which redirect
stderr and stdout to separate files.
Under PSE, signals which are received by the controlling process are propagated to all application peers for processing. The following set of signals are "caught" and propagated:
Signals not identified above, are handled "locally". They generally cause a core dump and an application exit.
The generation of core files occurs when an application encounters
situations from which PSE is unable to recover. These core files
can be large, if not limited by the corefilesize
resource limit. The resulting data can be useful when debugging
the application.
Because PSE generally runs multiple instances of an application
at the same time from the same directory, the generation
of multiple core files could overwrite one another. To
prevent this occurrence, core files are generated in a set of
subdirectories. One subdirectory is generated for each peer and
the directories are named core.N , for example,
core.0, core.1, . . . . By default the subdirectory is
created in the current working directory. You may specify alternate
file paths to use as a corefile root by setting the PSE_
CORE_DIR environment variable. There is no command line
equivalent.
PSE_CORE_DIR environment variable, See Section 8.5.1.4.
Parallel HPF applications use message passing to coordinate and move required data between processes. The messaging fabric available to PSE users depends both upon the number and network adapter types installed on PSE cluster machines, as well as the PSE configuration.
The communications medium is chosen separately for each inter-peer route. For example, within a single execution, the messages between peer 1 and peer 2 may use a medium different from the medium used between peer 1 and peer 3.
By default, PSE chooses the highest-bandwidth medium available between any two peers. Although usually not needed, you can change the order of preference for choosing a communications medium in one of three ways:
PSE_PREF_COMM entry in the PSE
database
PSE_PREF_COMM environment
variable
-pref_comm command line option.
The syntax of the PSE_PREF_COMM definition is as
follows:
Environment variable (C shell):
% setenv PSE_PREF_COMM pref-comm-spec
Command-line option:
% my_program -pref_comm pref-comm-spec
pref-comm-spec is a list containing one or more communication-medium specifiers. The order in which the specifiers are listed determines PSE's preference in choosing a communications medium between any two peers. It is generally desirable to put higher-bandwidth media at the beginning of the list, and lower- bandwidth media at the end of the list. The specifiers currently supported are:
The default setting for PSE_PREF_COMM is:
PSE_PREF_COMM shm,mc,atm,fddi,ethernet
Additional specifiers may be defined in the PSE cluster database.
Users can view the available specifiers with lspart and
psemon .
The communications medium and the protocol used over the network
are controlled by two separate environment variables. The
network medium is controlled by the PSE_PREF_COMM
environment variable. The network protocol is controlled by the
HPF_COMM environment variable and its associated
command-line option, -c .
HPF_COMM and -c , see Section 8.5.1.2.
A program executes until it reaches a successful completion or until one or more of the peers exits with an error. The exit status of each peer process is reported to the application controlling process by its io_manager. If the application exit is caused by an error, the controlling process waits a predetermined amount of time to allow standard I/O to finish and then exits with an appropriate exit value. HPF applications implement a synchronized exit. If one or more peers are unable to report exit values within the allotted amount of time after the first exit, the application exits with an error.
The exit value of an application is zero unless errors are encountered. In the case of errors, the application exit value is defined to be the first nonzero value received by the controlling process.
A remote host is defined as any host which is not a PSE cluster member. Successful application execution from a remote host depends upon:
-loadserver command line argument or PSE_
LOADSERVER environment variable.
In all other respects, execution of a user program is exactly as previously described.
Programs compiled with the -wsf option without an
argument, or with an argument of 1, may be executed as normal scalar
programs without using PSE, by running with the -local
option, which forces the application to run with one peer process on
the local machine.
-local does not work together with
-debug . When this is needed, the best work-around
is to run the program under the debugger like this:
% ladebug a.out (ladebug) run -local
PSE provides modified versions of the Ladebug and dbx debuggers. The default debugging environment provided is Ladebug in n windows: one X-window terminal session attached to a Ladebug session for each peer. dbx in n windows is also available.
Debugging sessions are invoked by running the PSE application
with the -debug option. These tools let you view the
activity of each peer while it is being debugged.
These modified versions of Ladebug and dbx let you view elements of
a distributed HPF array using the hpfget command. The
-debug option lets you:
You can select a different debugger by setting the
DEBUGGER environment variable to the pathname of the
desired debugging tool.
The pprof profiler measures a parallel application's
performance by capturing the amount of time the program spends in
different code regions or execution activities. The information in
pprof profiling reports can be used to locate hotspots
and to identify aspects of program execution that might benefit from
optimization. The profiler also provides system run-time performance
statistics for processes running under PSE.
pprof profiler, see Chapter 10.