Exemplar C and Fortran 77 Programmer's Guide: Introduction

Exemplar C and Fortran 77 Programmer's Guide

Introduction

[ Previous Page ] [ Next Page ] [ Contents ]


Last modified on: Tuesday, April 15 1997 at 04:29pm

This chapter introduces the Hewlett-Packard Exemplar C and Fortran 77 compilers. These compilers are based on the 10.2x releases of the standard Hewlett-Packard C and Fortran 77 compilers and are designed for creating applications for SPP-UX V5.1 or higher.


The programming model

The Exemplar compilers implement a subset of the Exemplar programming model, which provides advanced parallelism. This model supports the following programming paradigms:

In the shared-memory paradigm, the compilers perform optimizations and--if requested--parallelization. Directives and pragmas allow you to further increase optimization opportunities.

In the message-passing paradigm, the programmer uses functions to explicitly spawn parallel processes, share data among processes, and coordinate their activities.

The shared-memory/message-passing paradigm allows you to combine the two paradigms, taking advantage of their respective strengths.

This book focuses on the compiler support for the shared-memory paradigm. See the Exemplar Programming Guide for information on programming efficiently using the shared-memory paradigm.

See the HP MPI User's Guide and the HP PVM User's Guide for information on using message passing.


Standard HP compiler information

This section discusses some of the standard HP compiler options that are referenced later in this book. However, this book is a supplement to the standard HP compiler documentation. See the cc(1) and f77(1) man pages for:

See the section "Associated documents" on page xvi for a list of additional documentation.

NOTE: The exact optimizations performed by the SPP1000-Series compilers (/usr/convex/bin/fc, /usr/convex/bin/cc) are not available in the Exemplar compilers. The Exemplar compilers perform the optimizations supported by the standard HP compilers. In many cases, these optimizations are similar. Also, the Exemplar compilers perform optimizations beyond those found in the standard HP compilers. See the Exemplar Programming Guide for more information.

+O0 (default)

Optimization level +O0 is the default optimization level. Your code compiles fastest at this level, but with little optimization. Code development and debugging should be done at this level.

At optimization level +O0, the optimizations in Table 1 are performed.

Table 1 Optimizations performed at +O0

Optimization

Description

Constant folding

Replaces an operation on constant operands with the result of the operation

Partial evaluation of test conditions

Determines, where possible, the truth value of a logical expression without evaluating all the operands (also known as short-circuiting)

+O1

The transformations performed at +O1 are local to small subsections of code and, therefore, are performed quickly and with little runtime storage required by the compiler. Use +O1 when some optimization is desired, but when compile-time performance is more important than runtime performance.

At optimization level +O1, the optimizations listed in Table 2 are performed.

Table 2 Optimizations performed at +O1

Optimization

Description

+O0 optimizations

See Table 1

Branch optimizations

Changes branch instructions into more efficient sequences

Dead code elimination

Removes code that is unreachable or is otherwise never executed

Instruction scheduling

Schedules instructions in a basic block to take advantage of memory pipelining

More efficient use of registers

Peephole optimizations

Replaces assembly language instruction sequences with faster sequences and removes redundant register loads and stores

+O2, -O

You can use either -O or +O2 to enable the +O2 optimizations.

Transformations at +O2 are performed over the scope of each procedure. If you use this optimization level, the compiler uses more memory than at +O1 and takes longer to process your program. Optimizing procedures of more than 1,000 lines at this level takes considerably longer than at +O1.

At optimization level +O2, the optimizations in Table 3 are performed.

Table 3 Optimizations performed at +O2 --(continued)

Optimization

Description

+O0 and +O1 optimizations

See Table 1 and Table 2

Global register allocation

Determines when and how long commonly used variables and expressions occupy a register

Strength reduction of
induction variables

Removes linear functions of a loop counter and replaces each function with a variable that contains the value of that function

Strength reduction of constants

Replaces some multiplication instructions with addition instructions

Common subexpression elimination

Replaces subsequent instances of an expression with its result

Advanced constant folding and propagation (Simple constant folding is done at +O0)

Replaces an operation on constant operands with result of the operation (constant folding) and replaces variable references with a constant value previously assigned to that variable (constant propagation)

Loop-invariant code motion

Recognizes instructions inside a loop where the results never change and moves those instructions outside the loop

Store/copy optimization

Substitutes registers for memory locations

Unused definition elimination

Removes unused references to memory locations and register definitions

Software pipelining

Re-arranges the order in which instructions execute in a loop to prevent processor stalls

Register reassociation

Reduces the cost of computing address expressions for array references by dedicating a register to track the value of the address expression

Loop unrolling (innermost loops)

Increases a loop's step value and replicates the loop body, with each replication appropriately offset from the induction variable so that all iterations are performed given the new step

+O3

At optimization level +O3, the following optimizations are made:

+O4

At this level, optimization occurs at link time, allowing the optimizer to analyze all files compiled with the +O4 option at once. Because analysis is done when linking, the compile time is generally shorter than at lower optimization levels, but linking takes more time.

At optimization level +O4, the following optimizations are made:

+O[no]aggressive

The +O[no]aggressive option enables optimizations that can result in significant performance improvement, but that can change a program's behavior. These optimizations include those invoked by the following advanced options (which are described in the cc(1) and f77(1) man pages):

The default is +Onoaggressive. The +O[no]aggressive option can be used at +O2 and above.

+O[no]all

The +Oall option applies maximum optimization to achieve the best runtime performance. This option is equivalent to specifying +Oaggressive and +Onolimit on the same command line. The +Oall option implies +O4. The default is +Onoall.

+O[no]fail_safe

The +Ofail_safe option allows a compilation with internal optimization errors to continue, rather than abort. If internal optimization errors are found, the compiler issues a warning message, then restarts the compilation at +O0. When using +Onofail_safe, compilation aborts if internal optimization errors occur.

This option can be used at +01 or higher. The default is +Ofail_safe.

+O[no]info

The +O[no]info option displays [does not display] feedback information about the optimization process (for example, cloning and inlining). Currently, this option is useful only at +O3 and above. The default is +Onoinfo. For information on a related option, see the section "+O[no]report[=report_type]" on page 21.

+Oinline_budget=n

In +Oinline_budget=n, n is an integer in the range 1 to 1000000 that specifies the level of aggressiveness, as follows:

n = 100

Default level of inlining.

n > 100

More aggressive inlining.

The optimizer is less restricted by compilation time and code size when searching for eligible routines to inline.

n = 1

Only inline if it reduces code size.

The default is +Oinline_budget=100.

The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option implies specifying +Oinline_budget=200. The +Osize option implies +Oinline_budget=1.

Note, however, that the +Oinline_budget option takes precedence over both of these options. This means that you can override the effects on inlining of the +Onolimit and +Osize options by specifying the +Oinline_budget option on the same command line.

The +Oinline_budget=n option is valid at +O3 and above.

+O[no]limit

The +O[no]limit option suppresses [does not suppress] optimizations that significantly increase compile-time or consume large amounts of memory. Specifying +Onolimit implies specifying +Oinline_budget=200. (See the section "+Oinline_budget=n" on page 7 for additional information.) This option can be used at +O2 and above. The default is +Olimit.

+O[no]loop_transform

The +O[no]loop_transform option transforms [does not transform] eligible loops for improved cache performance. The transforms include loop distribution, loop interchange, and loop fusion. This option can be used at +O3 and above. The default is +Oloop_transform.

+O[no]loop_unroll[=n]

This option unrolls [does not unroll] program loops by a factor of n. For example, specifying +Oloop_unroll=4 requests the optimizer to replicate the loop body four times. This option can be used at +O2 and above. The default is +Oloop_unroll=4.

+O[no]parallel_env

NOTE: Do not use the +Oparallel_env option unless you are creating a process-based parallel application. Applications created using the Exemplar programming model are thread-based parallel.

This option compiles for a parallel [serial] execution environment. The +Oparallel_env option does not request parallelization for the target source; rather, it ensures a consistent execution environment for all files in a parallel program. This option is only supported for applications using process-based parallelism. If you want to compile an application for process-based parallel execution, you must compile all of its files with +Oprocess_threads and either +Oparallel or +Oparallel_env. Do not use +Oparallel_env with +Oparallel.

+O[no]size

The +Osize option suppresses optimizations that significantly increase code size. Specifying +Osize implies specifying +Oinline_budget=1. See the section "+Oinline_budget=n" on page 7 for additional information.

The +Onosize option does not prevent optimizations that can increase code size.

The +O[no]size option can be used at +O2, +O3, or +O4. The default is +Onosize.


Compiler usage

The examples in this section demonstrate the use of the C and Fortran 77 compilers. The functionality and options illustrated in any example in the book apply to both the C and Fortran 77 compilers.

Using the C compiler

There are two C compilers, as described below:

The remainder of this book refers to the cc compiler. Any cc example also applies to the c89 compiler.

The cc compiler command has the form:

% cc [options] files

where

options

is zero or more of the C compiler options

files

is a space-delimited list of one or more files

For example, the following command

% cc prog1.c prog2.c prog3.c

compiles the three files (prog1.c, prog2.c, prog3.c) and produces an executable, which is named a.out by default.

In this command,

% cc -o prog prog.c proc1.o

cc compiles prog.c to produce the object file prog.o, then calls the linker ld to link prog.o and proc1.o with the default start-up routines and library routines. The file prog.o is deleted after linking. The -o prog causes the resulting executable file to be named prog instead of a.out.

This command

% cc -g +O0 prog.c

shows the debugging option (-g) and the request of level 0 optimizations (+O0).

For additional information, see the cc(1) man page.

Using the Fortran 77 compiler

There are two Fortran 77 compilers, as described below:

The remainder of this book refers to the f77 compiler. Any f77 example also applies to the fort77 compiler.

The f77 compiler command has the form:

% f77 [options] files

where

options

is zero or more of the Fortran 77 compiler options

files

is a space-delimited list of one or more files

For example, the command

% f77 -c prog.f

compiles the file prog.f to produce the object file prog.o, then (because of the -c option) suppresses linking. The prog.o file can be linked later by including it on a f77 command line or by using the linker (ld) directly.

In the following example,

% f77 -v prog1.f prog2.f

the verbose mode is enabled by using -v. When compiling in verbose mode, the compiler displays (to standard error) a step-by-step description of the compilation process.

This command

% f77 +O3 +Oparallel prog.f

shows the request of level 3 optimizations (+O3) and the request that the compiler honor the parallelism directives of the Exemplar programming model and generate parallel code where appropriate (+Oparallel). The +Oparallel option is only valid at +O3 and above.

For additional information, see the f77(1) man page.


Options to get you started

This section highlights options that you may want to use regularly with the Exemplar compilers. The options are performance-related and are described only briefly in this section; however, sources for more information are included, where available.

Table 4 Options to get you started--(continued)

Option

Description

+O3

Invoke level 3 optimizations. See the section "+O3" on page 5 for more information.

+O3 +Oparallel

Invoke level 3 optimizations and cause the compiler to honor parallelism directives and pragmas from the Exemplar programming model and to generate parallel code where appropriate*.

If you compile with +O3 +Oparallel, be sure to also link with +O3 +Oparallel (if you link separately).

See the section "+O[no]parallel" on page 20 for more information.

+Odataprefetch

Prefetch data referenced in loops. See the section "+O[no]dataprefetch" on page 17 for more information.

+Ofltacc

Disable floating-point optimizations that can result in numerical differences. See the cc(1) or f77(1) man page for more information. (Available only on S2000 and X2000 servers.)

+Oinfo

Display information on the optimization process. See the section "+O[no]info" on page 7 for more information.

+Olibcalls

Use low-call-overhead versions of select library routines. See the cc(1) or the f77(1) man page for more information.

+Onoautopar

Request that the compiler parallelize only those loops with prefer_parallel or loop_parallel directives or pragmas. See the section "+O[no]autopar" on page 17 for more information.

+Onolimit

Do not suppress optimizations that significantly increase compile-time or consume large amounts of memory. See the section "+O[no]limit" on page 8 for more information.

+Onodepar

Enable directive-specified, node-level parallelism. See the section "+O[no]nodepar" on page 19 for more information.

+Onoparmsoverlap

Optimize with the assumption that subprogram arguments do not refer to the same memory. When this option can be used, it allows the compiler to generate significantly faster code. See the cc(1) man page for more information. (Available only in C.)

+Oreport

Display optimization reports. See the section "+O[no]report[=report_type]" on page 21 for more information.

-Wl,-aarchive_shared

(For use when linking with the compiler driver) Search the archive version of a library; if the archive version is not available, search the shared version of the library.

-Wl,+FPD

(For use when linking with the compiler driver) Underflows are exceptions, by default; this option avoids exceptions so that underflows just generate zeros.

*Assuming +Onoexemplar_model is not also specified (See the section "+O[no]exemplar_-model" on page 18 for more information.)


[ Previous Page ] [ Next Page ] [ Contents ]