1.1 What is HPF?

High Performance Fortran (HPF) is a set of extensions to the Fortran 90 standard that permits programmers to specify how data is to be distributed across multiple processors in a parallel programming environment. HPF's constructs allow programmers to indicate potential parallelism at a relatively high level, without entering into the low-level details of message- passing and synchronization. When an HPF program is compiled, the compiler assumes responsibility for scheduling the parallel operations on the physical machines, greatly reducing the time and effort required for parallel program development. For appropriate applications, parallel programs can execute dramatically faster than ordinary Fortran programs.

HPF is implemented as an integral component of the Digital Fortran 90 compiler. HPF programs compiled with the Digital Fortran 90 compiler can be executed serially on a single-processor Alpha system, or in parallel on a PSE cluster running Digital's PSE software.

1.1.1 Why Parallel Processing?

The fundamental premise of parallel processing is that running a program in parallel on multiple processors is faster than running the same program on a single processor. To achieve this desired speed-up, the program must be decomposed so that different data or instructions can be parceled out among the processors, allowing simultaneous execution.

A further advantage of parallel processing is that a system can be scaled or built up gradually. If, over time, a parallel system becomes too small for the tasks needed, additional processors can be added to meet the new requirements.

Ideally, the performance gain of parallel operations should be proportionate to the number of processors participating in the computation. In some cases, the gain is even greater, due to the fact that two processors have twice as much cache memory as one processor. In most cases, however, the gain is somewhat less, because parallel processing inevitably requires a certain amount of communications and synchronization overhead. As a practical matter, communication of even small amounts of data is enormously more time- consuming than any single computation. Minimizing communications costs and idle time among processors is key to achieving optimized parallel performance.

HPF gives programmers the ability to specify data distribution and data parallel operations at a high level. The compiler takes care of the details of the parallel execution. However, it is the program's responsibility to provide enough information to the compiler to ensure that data is distributed among the participating processors in the most efficient manner.

1.1.2 Parallel Programming Models

The design of parallel programs begins with the choice of a programming model that governs the overall structure of the program. Several models of parallelism can be used in parallel applications, for example:

All of these types of parallelism, and others as well, are useful in certain applications. It is difficult, however, to support all of these models in the same language. HPF concentrates primarily on data parallel computations, a widely useful class. To provide some access to other models of parallelism, an HPF program can contain what are known as extrinsic procedures, which can be written for other programming paradigms, or even in another programming language, such as C or assembly language. This language feature allows for the use of existing libraries, such as the libraries associated with X Windows.

1.1.3 Data Parallel Programming

The data parallel programming model is based on the premise that many large scale programs have a "natural" parallelism at a fine- grain level, such as performing the same operation on all the elements of an array.

To perform such fine-grained parallel operations, data parallel programs rely on three basic structural features:

1.1.4 HPF and Data Parallelism

HPF contains features for specifying data parallel operations, and for mapping data across processors.

The program must specify sections of code to be considered by the compiler for parallelization by supplying supplemental high-level data partitioning information.

When the program is compiled, the complex details of communications and synchronization involved in coordinating the parallel operations are generated by the compiler automatically, eliminating the need for manual insertion of explicit message-passing calls.

An application can be developed and run on a single workstation, and run on a PSE cluster of any size.