The Digital Fortran 90 compiler can be
used to produce either standard applications that execute on a
single processor (serial execution), or parallel applications that
execute on multiple processors using PSE. Parallel applications are
produced by using the Digital Fortran 90
compiler with the -wsf option.
This section describes the Digital Fortran 90 command-line options that are specifically relevant to parallel HPF programs.
Specifying the -wsf option indicates that the program
should be compiled to execute in parallel on multiple processors.
HPF directives in programs affect program execution only if the
-wsf option is specified at compile time. If the
-wsf option is omitted, HPF directives are checked
for syntax, but otherwise ignored.
Specifying -wsf with a number as an argument
optimizes the executable for that number of processors. For
example, specifying -wsf 4 generates a program for
4 processors. Specifying -wsf without an argument
produces a more general program that can run on any arbitrary
number of processors. Using a numerical argument results in superior
application performance.
For best performance, do not specify an argument to
-wsf that is greater than the number of CPUs
that will be available at run time. Relying upon the PSE
-virtual run-time option to simulate a PSE cluster
larger than the number of available processors usually causes
degradation of application performance.
Any number of processors is allowed. However, performance may be degraded in some cases if the number of processors is not a power of two.
The -nearest_neighbor , -pprof and
-show hpf options can be used only when -
wsf is specified.
When parallel programs are compiled and linked as separate steps
(see the documentaton of the -c option in the Fortran User Manual), the -wsf
option must be used with the f90 command both at
compile time and link time. If -wsf is used with a
numerical argument, the same argument must be used at compile time
and link time.
-virtual run-time option, see
Sections 8.2.5 and 8.5.1.11.
-c option in the Fortran User Manual.
An array (or array section) is zero sized when the extent of any
of its dimensions takes the value zero or less than zero. When the
-wsf option is specified, the compiler is required to
insert a series of checks to guard against irregularities (such as
division by zero) in the generated code that zero-sized data objects
can cause. Depending upon the particular application, these checks
can cause noticeable (or even major) degradation of performance.
The -assume nozsize option causes the compiler to omit
these checks for zero-sized arrays and array sections. This option
is automatically selected when the -fast option is
selected.
The -assume nozsize option may not be used when a
program references any zero-sized arrays or array sections. An
executable produced with the -assume nozsize option
may fail or produce incorrect results when it references any zero-
sized arrays or array sections.
You can insert a run-time check into your program to ensure
that a given line is not executed if an array or array section
referenced there is zero sized. This will allow you to specify
-assume nozsize even when there is a possibility of
a zero-sized array reference in that line.
The -fast option activates options that improve run-
time performance. Among the options set by the -fast
option is the -assume nozsize option (a full list
of the options set by -fast can be found in the
Fortran User Manual and in the Fortran
90(1) manpage). This means that the restrictions that apply
to the -assume nozsize option also apply to the
-fast option.
The compiler's nearest-
neighbor optimization is enabled by default. The
-nearest_neighbor option is used to modify the limit
on the extra storage allocated for nearest neighbor optimization.
The -nonearest_neighbor option is used to disable
nearest neighbor optimization.
The compiler automatically determines the correct shadow-edge widths on an array-by-array, dimension-by-dimension basis. You can also set shadow-edge widths manually by using the SHADOW keyword inside the DISTRIBUTE directive. This is necessary to preserve the shadow edges when nearest-neighbor arrays are passed as arguments.
The optional nn field specifies the maximum allowable shadow-edge width in order to set a limit on how much extra storage the compiler may allocate for nearest-neighbor arrays. The nearest- neighbor optimization is not performed for array dimensions needing a shadow-edge width greater than nn.
When programs are compiled with the -wsf option, the
default is
-nearest_neighbor 10 .
The -nonearest_neighbor option disables the
nearest-neighbor optimization. It is equivalent to specifying
-nearest_neighbor 0 .
Use the -nowsf_main option to incorporate parallel
routines into non-parallel programs.
When you incorporate parallel routines into non-parallel programs,
some routines must be compiled with -nowsf_main , and
some should be compiled without -nowsf_main . Please
refer to Table 6-2.
-nowsf_main option, see Section 6.7.
The -pprof option prepares a parallel program for
subsequent profiling with the pprof profiler.
In order to use the -pprof option, certain
requirements must be met:
-wsf option must be specified.
-pprof i ; for counter sampling, use
-pprof s .
-p1 and -p options must be
omitted.
For more information on the -pprof option and on
the pprof profiler, see Chapter 10, Parallel Profiler.
The -show hpf option sends information related
to parallelization to standard error and to the listing (if
one is generated with -V ). These flags are valid only if the
-wsf flag is specified. You can use this information
to help you tune your program for better performance.
This option has several forms:
-show hpf_comm includes detailed information
about statements which cause interprocessor communication to be
generated. This option typically generates a very large number of
messages.
-show hpf_indep includes information about the
optimization of DO loops marked with the INDEPENDENT directive.
Every marked loop will be acknowledged, and an explanation
given for any INDEPENDENT DO loop that was not successfully
parallelized.
-show hpf_nearest includes information about
arrays and statements involved in optimized nearest-neighbor
computations. Messages are generated only for statements that are
optimized. This option allows you to check whether statements
that you intended to be nearest-neighbor are successfully
optimized by the compiler. It is also useful for finding out
shadow-edge widths that were automatically generated by the
compiler.
-show hpf_punt gives information about
distribution directives that were ignored and statements that
were not handled in parallel. For more information on serialized
routines, see Section 7.4.
-show hpf_temps gives information about
temporaries that were created at procedure interfaces.
-show hpf_all is the same as specifying all the
other -show hpf_ options.
-show hpf generates a selected subset of those
messages generated by the other -show hpf_ options.
It is designed to provide the most important information, while
minimizing the number of messages. It provides the output of
-show hpf_indep , -show hpf_nearest ,
and -show hpf_punt , as well as selected messages
from -show hpf_comm .
It is usually best to try using -show hpf first. Use
the others only when you need a more detailed listing.
-show can take only one argument. However, the
-show flags can be combined by specifying
-show multiple times. For example:
% f90 -wsf -show hpf_nearest -show hpf_comm -show hpf_punt foo.f90
-show hpf ,
see Section 7.13.
When linking is done as a separate step from compiling, the Digital Fortran 90 compiler requires
all objects to be compiled with the same argument to the
-wsf option's optional [nn] field. If
objects were compiled for an inconsistent number of processors,
the following error message occurs:
Unresolved: _hpf_compiled_for_nn_nodes_
If you do not know which object was compiled for the wrong number of
processors, the incorrectly compiled object can be identified using
the UNIX nm utility.
-c option in the Fortran User Manual.
nm utility to identify an object
that was compiled for the wrong number of processors, see Section B.1.