Digital High Performance Fortran 90
HPF and PSE Manual

Order Number: AA-Q62LC-TE

January 1997

This manual explains the High Performance Fortran (HPF) programming language and the Digital Parallel Software Environment (PSE). It contains a tutorial and reference materials describing Digital Fortran 90's HPF extensions. It describes the installation, set up, administration, and general use of PSE (including the profiler and debugger), as well as hardware configuration for a parallel PSE cluster.

This manual describes how to develop and run HPF programs under PSE.

Revision /Update Information: January 1997

Revision /Update Information This manual supersedes the previous version of this manual, the Digital High Performance Fortran 90 HPF and PSE Manual, order number AA-Q62LB- TE.

Operating System and Version: Digital UNIX (OSF/1), Version 3.0 and higher

Software Versions: Digital Fortran Version 4.1 Digital Parallel Software Environment Version 1.2

Digital Equipment Corporation
Maynard, Massachusetts

©Digital Equipment Corporation 1997. All Rights Reserved.


About This Manual

1 Introduction
1.1 What is HPF?
1.1.1 Why Parallel Processing?
1.1.2 Parallel Programming Models
1.1.3 Data Parallel Programming
1.1.4 HPF and Data Parallelism
1.2 What Is PSE?
1.2.1 PSE Cluster Definition
1.2.2 Baseline PSE Cluster Configuration
1.2.3 Hardware Overview
1.2.4 Systems Software Overview
1.2.5 PSE Cluster Entities and Conventions
1.2.6 Parallel Application Execution
1.2.7 Developing Parallel Applications
1.2.8 PSE Utility Commands

HPF Tutorial

2 LU Decomposition
2.1 Using LU Decomposition to Solve a System of Simultaneous Equations
2.2 Coding the Algorithm
2.2.1 Fortran 77 Style Code
2.2.2 Parallelizing the DO Loops
2.2.3 Comparison of Array Syntax, FORALL, and INDEPENDENT DO
2.3 Directives are needed for Parallel Execution
2.3.1 DISTRIBUTE Directive
2.3.2 Deciding on a Distribution
2.3.3 Distribution for LU Decomposition Parallel Speed-Up
2.4 Packaging the Code
3 Solving Nearest Neighbor Problems
3.1 Two-Dimensional Heat Flow Problem
3.2 Jacobi's Method
3.3 Coding the Algorithm
3.4 Illustration of the Results
3.5 Distributing the Data for Parallel Performance
3.5.1 Deciding on a Distribution
3.5.2 Optimization of Nearest-Neighbor Problems
3.6 Packaging the Code
4 Visualizing the Mandelbrot Set
4.1 Introduction to the Mandelbrot Set
4.1.1 What is the Mandelbrot Set?
4.1.2 How the Mandelbrot Set is Visualized
4.1.3 Electrostatic Potential of the Set
4.2 Mandelbrot Program
4.2.1 Functionality of the Program
4.2.2 Developing the Algorithm Fortran 90 features in Example 4-1 Explanation of Example 4-1
4.2.3 Computing the Entire Grid
4.2.4 Converting to HPF
4.2.5 The PURE Attribute
4.2.6 Packaging the Code
5 Simulating Network Striped Files
5.1 Why Simulate Network Striped Files?
5.1.1 Constructing a Module for Parallel Temporary Files
5.2 Subroutine parallel_open
5.3 Subroutine parallel_write
5.3.1 Passing Data through the Interface
5.4 Subroutines parallel_read, parallel_close, and parallel_rewind
5.5 A Test Program For Parallel Temporary Files
5.6 parallel_temporary_files Module

HPF Reference

6 HPF Essentials
6.1 HPF Basics
6.1.1 When to Use HPF Existing Code New Code
6.1.2 Syntax and Function of HPF Directives
6.1.3 List of HPF Directives
6.1.4 Minimum Requirements for Parallel Execution
6.2 Data Parallel Array Operations
6.2.1 Array Terminology
6.2.2 Fortran 90 Array Assignment Whole Array Assignment Array Subsections
6.2.3 FORALL
6.2.4 The INDEPENDENT Directive
6.2.5 Vector-Valued Subscripts
6.2.6 Entity-Oriented Declaration Syntax
6.2.7 SEQUENCE and NOSEQUENCE Directives
6.2.8 Out of Range Subscripts
6.3 Data Mapping
6.3.1 Data Mapping Basics
6.3.2 Illustrated Summary of HPF Data Mapping
6.3.3 ALIGN Directive
6.3.4 TEMPLATE Directive
6.3.5 PROCESSORS Directive
6.3.6 DISTRIBUTE Directive Explanation of the Distribution Figures BLOCK Distribution CYCLIC Distribution BLOCK, BLOCK Distribution CYCLIC, CYCLIC Distribution CYCLIC, BLOCK Distribution BLOCK, CYCLIC Distribution Asterisk Distributions Visual Technique for Computing Two-Dimensional Distributions Using DISTRIBUTE Without an Explicit Template Using DISTRIBUTE Without an Explicit PROCESSORS Directive Deciding on a Distribution
6.3.7 Shadow Edges for Nearest-Neighbor Algorithms
6.4 Subprograms in HPF
6.4.1 Explicit-Shape, Assumed-Shape, and Assumed-Size Array Specifications
6.4.2 Descriptive Mapping
6.4.3 Explicit Interfaces
6.4.4 MODULE
6.4.5 PURE Attribute
6.4.6 Transcriptive Distributions and the INHERIT Directive
6.5 Intrinsic and Library Procedures
6.5.1 Intrinsic Procedures
6.5.2 Library Procedures
6.6 Extrinsic Procedures
6.6.1 Programming Models and How They Are Specified
6.6.2 Who Can Call Whom Calling Non-HPF Subprograms from EXTRINSIC(HPF_LOCAL) Routines
6.6.3 Requirements on the Called EXTRINSIC Procedure
6.6.4 Calling C Subprograms from HPF Programs
6.7 Calling HPF Subprograms from non-Parallel Fortran or C Programs
6.7.1 Building Programs that Have Both Parallel and non-Parallel Routines
6.7.2 Linking Programs Without Using the f90 Command
7 Optimizing HPF Programs
7.1 -fast Compile-Time Option
7.2 Converting Fortran 77 Programs to HPF
7.3 Interfaces
7.4 Nonparallel Execution of Code and Data Mapping Removal
7.5 Compile Speed
7.6 Nearest-Neighbor Optimization
7.7 Compiling for a Specific Number of Processors
7.8 Avoiding Unnecessary Communications Setup for Allocatable Arrays
7.10 Synchronization
7.11 Input/Output in HPF
7.11.1 Guidelines for Generating Efficient Use of I/O
7.11.2 Specifying a Specific Processor as Peer 0
7.11.3 Printing Large Arrays
7.11.4 Reading and Writing to Variables Stored Only on Peer 0
7.12 Stack and Data Space Usage
7.13 -show hpf Option
7.14 Timing
7.15 Spelling of the HPF Directives

User Guide

8 Compiling and Running Parallel Programs
8.1 Compiling HPF Programs
8.1.1 Compile-Time Options for High Performance Fortran Programs -wsf [nn] Option - Compile for Parallel Execution -assume nozsize Option - Omit Zero-Sized Array Checking -fast Option - Set Options to Improve Run-Time Performance -nearest_neighbor [nn] and -nonearest_neighbor Options -nowsf_main Option - for Non-Parallel Main Programs -pprof Option - Preparing for Parallel Profiling -show hpf-Show Parallelization Information
8.1.2 Consistency of Number of Peers
8.2 PSE User Environment Overview
8.2.1 UNIX Semantics
8.2.2 PSE Clusters
8.2.3 Partitions
8.2.4 Load Balancing
8.2.5 Physical and Virtual Modes Definition and Performance Considerations Uses of -virtual
8.3 PSE Application Execution Model
8.3.1 Application Startup and Peer Selection Controlling Process I/O Manager Environment Propagation Peer Ordering Terminal I/O Signal Propagation Core Files Peer Communications Program Termination Exit Value
8.3.2 Executing an Application from a Remote Host
8.3.3 Executing a Parallel Application without PSE
8.3.4 Debugging
8.3.5 Profiling
8.4 Setting Up the Appropriate PSE Environment
8.4.1 Specifying a PSE Cluster
8.4.2 Selecting the Number of Peers
8.4.3 Specifying PSE Cluster Members in a Partition Selecting Peer 0
8.4.4 Specifying the Execution Priority
8.5 PSE Environment Variables and Command-Line Options
8.5.1 Command-Line Option and Environment Variable Summary -bind_to_cpu (HPF_BIND_TO_CPU) - Prevent process migration -c protocol-list (HPF_COMM) - Specify Communications Protocol -connections - Display Communications Matrix PSE_CORE_DIR - Specify Alternate Directory for Core Files -debug - Debug Program -farm farm_name (PSE_FARM)- Specify a PSE Farm/Cluster -loadserver member-list (PSE_LOADSERVER) - Specify a PSE Loadserver -local string (PSE_LOCAL) - Run Program in Scalar Mode -login and -nologin (PSE_LOGIN) - Control Remote Login -partition partition_name (PSE_PARTITION) - Specify a PSE Partition -physical, -virtual (PSE_MACHINE) - Specify mode of execution -on member_list (PSE_ON) - Specify which Machines to Run on - ordered -peers n (PSE_PEERS) - Specify Number of Peer Processes -priority n (PSE_PRIORITY) - Specify Execution Priority -pvmhostfile file_name (PSE_PVMHOSTFILE) - Specify a PVM Hostfile -stdin peer_number (PSE_STDIN) - Direct Standard Input -timeout seconds (HPF_RECV_TIMEOUT) - Set Timeout -N filespec and -n -verbose -exclude member- (PSE_EXCLUDE) - Prevent Execution on Certain Members -pref_comm (PSE_PREF_COMM) - Set Communications Media Preference -use member-list (PSE_USE) - Specify Which Machines to Run on (unordered)
8.5.2 PSE Profiler Environment Variables
9 Debugging HPF Programs
9.1 General Debugging
9.2 Debugging a Program Running Within PSE
9.2.1 Selecting Ladebug or dbx
9.2.2 Invoking the PSE Debugger
9.2.3 Using the hpfget Debugger Command
9.2.4 Problems with Ladebug in n Windows
10 Parallel Profiler
10.1 Assessing PSE Application Performance
10.1.1 PSE Program Execution
10.1.2 Parallel Profiling Methods
10.1.3 Parallel Profiling Data
10.2 Using the Parallel Profiler
10.2.1 Compilation Phase Digital Fortran Compiler Syntax
10.2.2 Data Output Phase Output Directories Data File Naming Convention
10.2.3 Analysis Phase pprof Command Syntax Default Options How to Use pprof Control Options Summary of Profiling Options
10.2.5 Interpreting pprof Analysis Output
10.2.6 Data Output Examples Default Interval Profiling Analysis Report Average Statistics Profiled Events Summary Report Long Routine Analysis Report for PC Sampling Statement Analysis Using PC Sampling
10.3 HPF Profiling Issues
10.3.1 Array Profiles
10.3.2 Reporting Array Operations Examples of Transformed Distributed Array Names
10.3.3 Profiling Extrinsic and Library Routines Using Interval Profiling
10.3.4 DO Loop Profiles
10.3.5 Branching and Premature Exit in Interval Profiling
10.3.6 Alternate Routine Entries
10.3.7 Local Routines in CONTAINS Statements
10.3.8 Using pprof to Interpret Program Performance A Recommended Step by Step Road Map
10.4 pprof Error Messages

Administration Guide

11 PSE Cluster Configuration Overview
11.1 Overview of PSE Cluster Components
11.2 Interconnect Technology
11.2.1 TCP/IP-Based Interconnects
11.2.2 Memory Channel
11.3 Example PSE Cluster Configurations
11.3.1 Single SMP as a PSE Cluster
11.3.2 PSE Cluster with a Switched Network Interconnect
11.3.3 PSE Cluster with a Memory Channel Interconnect
11.3.4 PSE Cluster with Multiple Interconnects
12 Setting Up the PSE Single System Image
12.1 Introduction
12.2 Recommended PSE System Characteristics
12.3 Providing Consistent File Systems Using NFS
12.4 Providing Consistent Time with NTP
12.4.1 NTP Overview
12.4.2 Using NTP in the PSE Environment
13 PSE Cluster Installation
13.1 Preinstallation Tasks
13.1.1 Checking Installation Requirements Checking the Software Distribution Kit Reading the Online Cover Letter Reading the Online Release Notes Checking Installation Procedure Requirements Login Privileges Checking Software Requirements Determining Which Subsets to Load Determining Disk Space Requirements Listing the Files Installed by PSE Checking Current Disk Space
13.1.2 PSE Cluster Member Requirements
13.1.3 Additional Preinstallation Requirements for Customized PSE Clusters Preinstallation for File-based PSE Clusters Preinstallation for DNS-based PSE Clusters
13.2 Loading the PSE Software Kit
13.2.1 Registering Your Software License
13.2.2 Deleting Existing PSE Software Subsets
13.2.3 Installing and Starting a PSE Cluster PSE Installation Summary Formation of a Basic PSE Cluster Installing PSE Software in a Dataless Environment
13.3 Propagating the PSE Kit Using pse-remote-install
13.4 Configuring PSE Networking Kernel Binaries
13.4.1 Installing UDP_prime
13.4.2 Deinstalling UDP_prime
13.4.3 Sample UDP_prime Installation Log
13.4.4 Sample UDP_prime Deinstallation Log
13.5 Verifying Your Configuration
13.5.1 Running the Installation Verification Procedure
13.5.2 Advice to Users on PSE Environment Set-up
13.6 Modifying the PSE cluster IP Port Number
13.7 Customizing Your PSE Cluster [Optional]
13.8 Reusing a PSE V1.0 Database
14 Installation Using setld
14.1 Using CD-ROM Consolidated Distribution Media
14.2 Using a RIS Distribution Area
14.3 Responding to Installation Procedure Prompts
14.3.1 Selecting Subsets
14.3.2 Monitoring Displays During the Subset Loading Process
14.4 Failures During Product Installation
15 Installation Using pse-remote-install
15.1 Installing Using pse-remote-install
15.2 Preparing for Remote Installation
15.2.1 Setting the PSE kit location
15.2.2 Creating a System File
15.2.3 Creating a License PAK File
15.2.4 Establishing rsh Access to the Remote Host
15.3 Running pse-remote-install
15.3.1 pse-remote-install Command Line Options
15.4 pse-remote-install Operations
15.4.1 Loading Software Subsets
15.4.2 Deleting Software Subsets
15.4.3 Configuring Software Subsets
15.4.4 Verifying Software Subsets
15.4.5 Listing Software Subset Inventory
15.5 pse-remote-install Limitations
16 Customizing a PSE Cluster
16.1 Creating a PSE cluster database
16.2 PSE Cluster Database Planning
16.3 PSE cluster database Overview
16.3.1 configuration_data Information
16.3.2 Partition Definition
16.4 Using the PSE Database Editor
16.4.1 Sample PSE Database Editor Session
16.5 Distributing the PSE Cluster Database
16.5.1 Distributing the PSE cluster Database using NFS
16.5.2 Distributing the PSE Cluster Database using DNS Specifying DNS Servers Using named Configuration Files DNS Clients Implementing DNS
16.6 PSE cluster Configuration Scripts
16.6.1 Running the modify_services Script
16.6.2 Running the configure_nodes Script
16.7 Understanding a DNS-Based PSE Cluster Database
16.7.1 Point of Origin Information
16.7.2 Domain Definition Information
16.7.3 Name Server Information
16.7.4 configuration_data Information
16.7.5 Partition Definition Information
17 PSE cluster Configuration and Maintenance
17.1 Modifying the PSE Cluster Database
17.2 PSE Cluster daemon
17.2.1 Adding a Machine to a Cluster
17.2.2 Removing a Machine from a Cluster
17.2.3 Maintaining the farmd Daemon Signaling the farmd Daemon to Update Its Information Modifying Job Slots Temporarily Disabling and Enabling farmd Shutting Down a PSE Cluster Member System Boot and Shutdown Monitoring farmd Activity for the Entire PSE Cluster
17.3 Partitioning the PSE Cluster
17.4 Adding Secondary Servers for DNS-based PSE Clusters
17.5 Accessing the PSE Cluster from Non-PSE-Cluster Members
17.5.1 DNS Access from a Non-PSE-Cluster Machine
17.6 Failures During Product Use


A Digital Parallel Software Environment Reference Pages
A.1 PSE Reference Pages
A.2 HPF Reference Pages
B Error Messages
B.1 Digital Fortran 90 Compiler Error Messages
B.2 HPF Run-Time Error Messages
B.2.1 PSE Runtime Errors
B.2.2 PSE Command Line Errors
B.2.2.1 Missing Command Arguments
B.2.2.2 Misplaced Command Arguments
B.3 pprof Profiler Error Messages (Compiler-Generated)
B.4 pprof Profiler Error Messages (PSE-Generated)
C Listing the Files Installed by the PSE Kit
D Digital Fortran 90 HPF Language Specification
D.1 HPF Directives in Procedure Interfaces
D.1.1 Language Changes
D.1.2 Ambiguous Combinations of Directives
D.1.2.1 Incomplete or Ambiguous Specifications
D.1.2.2 Restating the Problem in Terms of Motion
D.1.2.3 Specification of Conditions for Data Motion at a Procedure Call
D.2 Data Parallel Statements and Directives
D.2.1 The FORALL Construct
D.2.2 Data Alignment and Distribution Directives
D.2.3 PURE Procedures
D.3 Arguments to Procedures
D.3.1 Arguments to Global Procedures
D.3.2 Arguments to HPF_LOCAL Procedures


4-1 Iteration of the Function z*z+c
4-2 Using a DO Loop to Compute the Grid
4-3 Using a FORALL structure to Compute the Grid
4-4 The PURE Function escape_time
6-1 Code Fragment for Mapping Illustrations
7-1 Avoiding Communication Set-up with Allocatable Arrays
13-1 Sample UDP_prime Installation Log
13-2 Sample UDP_prime Deinstallation Log
13-3 Sample PSE IVP Log
14-1 PSE Subsets Loaded from CD-ROM Media
16-1 Creating a File-Based PSE Cluster Database
16-2 Creating a DNS-Based PSE Cluster Database
16-3 /etc/resolv.conf File Example
16-4 modify_services Script
16-5 configure_nodes Script
16-6 configuration_data Database Entry
16-7 Database Partition Definitions
17-1 Adding PSE Cluster Partitions to a Database
1-1 Parallel Program Creation and Execution
2-1 Distributing an Array (*, BLOCK)
2-2 Distributing an Array (*, CYCLIC)
2-3 Distributing an Array (BLOCK, CYCLIC)
2-4 Distributing an Array (BLOCK, BLOCK)
2-5 LU Decomposition with (*, BLOCK) Distribution
2-6 LU Decomposition with (*, CYCLIC) Distribution
3-1 Three-dimensional Problem and Its Two-dimensional Model
3-2 Shadow Edges for Nearest Neighbor Optimization
4-1 The Mandelbrot Set
4-2 Zoomed-in Detail of Part of the Mandelbrot Set
6-1 BLOCK Distribution - Array View
6-2 BLOCK Distribution - Processor View
6-3 CYCLIC Distribution - Array View
6-4 CYCLIC Distribution - Processor View
6-5 BLOCK, BLOCK Distribution - Array View
6-6 BLOCK, BLOCK Distribution - Processor view
6-7 CYCLIC, CYCLIC Distribution - Array View
6-8 CYCLIC, CYCLIC Distribution - Processor View
6-9 CYCLIC, BLOCK Distribution - Array View
6-10 CYCLIC, BLOCK Distribution - Processor View
6-11 BLOCK, CYCLIC Distribution - Array View
6-12 BLOCK, CYCLIC Distribution - Processor View
6-13 BLOCK,* Distribution - Array View
6-14 BLOCK, * Distribution - Processor View
6-15 CYCLIC, * Distribution - Array View
6-16 CYCLIC, * Distribution - Processor View
6-17 *, BLOCK Distribution - Array View
6-18 *, BLOCK Distribution - Processor View
6-19 *, CYCLIC Distribution - Array View
6-20 *, CYCLIC Distribution - Processor View
6-21 Visual Technique for Computing Two-Dimensional Distributions
8-1 Eight-Member PSE Cluster Connected by Ethernet and FDDI
8-2 8 Node Virtual
10-1 PSE Performance Factors
10-2 Parallel Profiling Process Flow Chart
11-1 Alpha Cluster
11-2 Single-Host PSE Cluster
11-3 PSE Cluster with a Switched Network Interconnect
11-4 PSE Cluster with a Memory Channel Interconnect
11-5 PSE Cluster with Memory Channel and Switched Netword Interconnects
12-1 Shared Files and File Systems
16-1 PSE Cluster Database File
6-1 HPF Directives and HPF-Specific Attribute
6-2 Compiler Options for Mixing Parallel and non-Parallel Routines
7-1 Explanation of Example 7-1
8-1 PSE Command-Line Options and Environment Variables
10-1 parallel profiling Data Types
10-2 Location of pprof Output
10-3 pprof Analysis Options
10-4 pprof Control Options
10-5 Data Types
10-6 Event Types
10-7 Profiling Strategies
13-1 PSE Subset Requirements
13-2 PSE Subset Sizes (Kilobytes Required)
15-1 pse-remote-install Options
16-1 configuration_data Example Entries
16-2 configuration_data Tokens
16-3 Partition Tokens
16-4 Example Partition Definitions
16-5 PSE cluster Database SOA Fields
16-6 PSE cluster Database Name Server Fields