
This chapter introduces the Hewlett-Packard Exemplar C and Fortran 77 compilers. These compilers are based on the 10.2x releases of the standard Hewlett-Packard C and Fortran 77 compilers and are designed for creating applications for SPP-UX V5.1 or higher.
The Exemplar compilers implement a subset of the Exemplar programming model, which provides advanced parallelism. This model supports the following programming paradigms:
In the shared-memory paradigm, the compilers perform optimizations and--if requested--parallelization. Directives and pragmas allow you to further increase optimization opportunities.
In the message-passing paradigm, the programmer uses functions to explicitly spawn parallel processes, share data among processes, and coordinate their activities.
The shared-memory/message-passing paradigm allows you to combine the two paradigms, taking advantage of their respective strengths.
This book focuses on the compiler support for the shared-memory paradigm. See the Exemplar Programming Guide for information on programming efficiently using the shared-memory paradigm.
See the HP MPI User's Guide and the HP PVM User's Guide for information on using message passing.
This section discusses some of the standard HP compiler options that are referenced later in this book. However, this book is a supplement to the standard HP compiler documentation. See the cc(1) and f77(1) man pages for:
See the section "Associated documents" on page xvi for a list of additional documentation.
NOTE: The exact optimizations performed by the SPP1000-Series compilers (/usr/convex/bin/fc, /usr/convex/bin/cc) are not available in the Exemplar compilers. The Exemplar compilers perform the optimizations supported by the standard HP compilers. In many cases, these optimizations are similar. Also, the Exemplar compilers perform optimizations beyond those found in the standard HP compilers. See the Exemplar Programming Guide for more information.
+O0 (default)Optimization level +O0 is the default optimization level. Your
code compiles fastest at this level, but with little optimization.
Code development and debugging should be done at this level.
At optimization level +O0, the optimizations in Table 1 are
performed.
Optimization |
Description |
|---|---|
Constant folding |
Replaces an operation on constant operands with the result of the operation |
Partial evaluation of test conditions |
Determines, where possible, the truth value of a logical expression without evaluating all the operands (also known as short-circuiting) |
+O1The transformations performed at +O1 are local to small
subsections of code and, therefore, are performed quickly and
with little runtime storage required by the compiler. Use +O1
when some optimization is desired, but when compile-time
performance is more important than runtime performance.
At optimization level +O1, the optimizations listed in Table 2 are
performed.
Optimization |
Description |
|---|---|
|
See Table 1 |
Branch optimizations |
Changes branch instructions into more efficient sequences |
Dead code elimination |
Removes code that is unreachable or is otherwise never executed |
Instruction scheduling |
Schedules instructions in a basic block to take advantage of memory pipelining |
More efficient use of registers |
|
Peephole optimizations |
Replaces assembly language instruction sequences with faster sequences and removes redundant register loads and stores |
+O2, -OYou can use either -O or +O2 to enable the +O2 optimizations.
Transformations at +O2 are performed over the scope of each
procedure. If you use this optimization level, the compiler uses
more memory than at +O1 and takes longer to process your
program. Optimizing procedures of more than 1,000 lines at this
level takes considerably longer than at +O1.
At optimization level +O2, the optimizations in Table 3 are
performed.
Optimization |
Description |
|---|---|
|
|
Global register allocation |
Determines when and how long commonly used variables and expressions occupy a register |
Strength reduction of |
Removes linear functions of a loop counter and replaces each function with a variable that contains the value of that function |
Strength reduction of constants |
Replaces some multiplication instructions with addition instructions |
Common subexpression elimination |
Replaces subsequent instances of an expression with its result |
Advanced constant folding and
propagation (Simple constant folding
is done at |
Replaces an operation on constant operands with result of the operation (constant folding) and replaces variable references with a constant value previously assigned to that variable (constant propagation) |
Loop-invariant code motion |
Recognizes instructions inside a loop where the results never change and moves those instructions outside the loop |
Store/copy optimization |
Substitutes registers for memory locations |
Unused definition elimination |
Removes unused references to memory locations and register definitions |
Software pipelining |
Re-arranges the order in which instructions execute in a loop to prevent processor stalls |
Register reassociation |
Reduces the cost of computing address expressions for array references by dedicating a register to track the value of the address expression |
Loop unrolling (innermost loops) |
Increases a loop's step value and replicates the loop body, with each replication appropriately offset from the induction variable so that all iterations are performed given the new step |
+O3At optimization level +O3, the following optimizations are made:
+O0, +O1, and +O2 optimizations (See Table 1, Table 2,
and Table 3)IF-DO interchange)+O4At this level, optimization occurs at link time, allowing the
optimizer to analyze all files compiled with the +O4 option at
once. Because analysis is done when linking, the compile time is
generally shorter than at lower optimization levels, but linking
takes more time.
At optimization level +O4, the following optimizations are made:
+O0, +O1, +O2, and +O3 optimizations (See Table 1,
Table 2, Table 3, and the section "+O3" above)+O4+O4+O[no]aggressiveThe +O[no]aggressive option enables optimizations that can
result in significant performance improvement, but that can
change a program's behavior. These optimizations include those
invoked by the following advanced options (which are described
in the cc(1) and f77(1) man pages):
+Osignedpointers (available only in C)+Oregionsched+Oentrysched
+Onofltacc
+Olibcalls
+Onoinitcheck
+OvectorizeThe default is +Onoaggressive. The +O[no]aggressive
option can be used at +O2 and above.
+O[no]allThe +Oall option applies maximum optimization to achieve the
best runtime performance. This option is equivalent to specifying
+Oaggressive and +Onolimit on the same command line. The
+Oall option implies +O4. The default is +Onoall.
O[no]fail_safeThe +Ofail_safe option allows a compilation with internal
optimization errors to continue, rather than abort. If internal
optimization errors are found, the compiler issues a warning
message, then restarts the compilation at +O0. When using
+Onofail_safe, compilation aborts if internal optimization
errors occur.
This option can be used at +01 or higher. The default is
+Ofail_safe.
+O[no]infoThe +O[no]info option displays [does not display] feedback
information about the optimization process (for example, cloning
and inlining). Currently, this option is useful only at +O3 and
above. The default is +Onoinfo. For information on a related
option, see the section "+O[no]report[=report_type]" on
page 21.
+Oinline_budget=nIn +Oinline_budget=n, n is an integer in the range 1 to 1000000
that specifies the level of aggressiveness, as follows:
Default level of inlining.
More aggressive inlining.
The optimizer is less restricted by compilation time and code size when searching for eligible routines to inline.
Only inline if it reduces code size.
The default is +Oinline_budget=100.
The +Onolimit and +Osize options also affect inlining.
Specifying the +Onolimit option implies specifying
+Oinline_budget=200. The +Osize option implies
+Oinline_budget=1.
Note, however, that the +Oinline_budget option takes
precedence over both of these options. This means that you can
override the effects on inlining of the +Onolimit and +Osize
options by specifying the +Oinline_budget option on the same
command line.
The +Oinline_budget=n option is valid at +O3 and above.
+O[no]limitThe +O[no]limit option suppresses [does not suppress]
optimizations that significantly increase compile-time or consume
large amounts of memory. Specifying +Onolimit implies
specifying +Oinline_budget=200. (See the section
"+Oinline_budget=n" on page 7 for additional information.)
This option can be used at +O2 and above. The default is
+Olimit.
+O[no]loop_transformThe +O[no]loop_transform option transforms [does not
transform] eligible loops for improved cache performance. The
transforms include loop distribution, loop interchange, and loop
fusion. This option can be used at +O3 and above. The default is
+Oloop_transform.
+O[no]loop_unroll[=n]This option unrolls [does not unroll] program loops by a factor
of n. For example, specifying +Oloop_unroll=4 requests the
optimizer to replicate the loop body four times. This option can be
used at +O2 and above. The default is +Oloop_unroll=4.
+O[no]parallel_envNOTE: Do not use the +Oparallel_env option unless you are creating a
process-based parallel application. Applications created using the
Exemplar programming model are thread-based parallel.
This option compiles for a parallel [serial] execution environment.
The +Oparallel_env option does not request parallelization for
the target source; rather, it ensures a consistent execution
environment for all files in a parallel program. This option is only
supported for applications using process-based parallelism. If you
want to compile an application for process-based parallel
execution, you must compile all of its files with
+Oprocess_threads and either +Oparallel or
+Oparallel_env. Do not use +Oparallel_env with
+Oparallel.
+O[no]size
The +Osize option suppresses optimizations that significantly
increase code size. Specifying +Osize implies specifying
+Oinline_budget=1. See the section "+Oinline_budget=n"
on page 7 for additional information.
The +Onosize option does not prevent optimizations that can
increase code size.
The +O[no]size option can be used at +O2, +O3, or +O4. The
default is +Onosize.
The examples in this section demonstrate the use of the C and Fortran 77 compilers. The functionality and options illustrated in any example in the book apply to both the C and Fortran 77 compilers.
There are two C compilers, as described below:
cc is the Exemplar C compiler and is located at
/opt/ansic/bin/cc.c89 is the Exemplar POSIX-conforming C compiler and is
located at /opt/ansic/bin/c89.The remainder of this book refers to the cc compiler. Any cc
example also applies to the c89 compiler.
The cc compiler command has the form:
% cc [options] files
where
is zero or more of the C compiler options
is a space-delimited list of one or more files
For example, the following command
% cc prog1.c prog2.c prog3.c
compiles the three files (prog1.c, prog2.c, prog3.c) and produces an executable, which is named a.out by default.
In this command,
% cc -o prog prog.c proc1.o
cc compiles prog.c to produce the object file prog.o, then calls the
linker ld to link prog.o and proc1.o with the default start-up
routines and library routines. The file prog.o is deleted after
linking. The -o prog causes the resulting executable file to be
named prog instead of a.out.
This command
% cc -g +O0 prog.c
shows the debugging option (-g) and the request of level 0
optimizations (+O0).
For additional information, see the cc(1) man page.
There are two Fortran 77 compilers, as described below:
f77 is the Exemplar Fortran 77 compiler and is located at
/opt/fortran/bin/f77.fort77 is the Exemplar POSIX-conforming Fortran 77
compiler and is located at /opt/fortran/bin/fort77.The remainder of this book refers to the f77 compiler. Any f77
example also applies to the fort77 compiler.
The f77 compiler command has the form:
% f77 [options] files
where
is zero or more of the Fortran 77 compiler options
is a space-delimited list of one or more files
For example, the command
% f77 -c prog.f
compiles the file prog.f to produce the object file prog.o, then
(because of the -c option) suppresses linking. The prog.o file can
be linked later by including it on a f77 command line or by using
the linker (ld) directly.
In the following example,
% f77 -v prog1.f prog2.f
the verbose mode is enabled by using -v. When compiling in
verbose mode, the compiler displays (to standard error) a
step-by-step description of the compilation process.
This command
% f77 +O3 +Oparallel prog.f
shows the request of level 3 optimizations (+O3) and the request
that the compiler honor the parallelism directives of the Exemplar
programming model and generate parallel code where
appropriate (+Oparallel). The +Oparallel option is only
valid at +O3 and above.
For additional information, see the f77(1) man page.
This section highlights options that you may want to use regularly with the Exemplar compilers. The options are performance-related and are described only briefly in this section; however, sources for more information are included, where available.
Option |
Description |
|---|---|
|
Invoke level 3 optimizations. See the section " |
|
Invoke level 3 optimizations and cause the compiler to honor parallelism directives and pragmas from the Exemplar programming model and to generate parallel code where appropriate*. If you compile with See the section " |
|
Prefetch data referenced in loops. See the section
" |
|
Disable floating-point optimizations that can result in numerical differences. See the cc(1) or f77(1) man page for more information. (Available only on S2000 and X2000 servers.) |
|
Display information on the optimization process. See the section
" |
|
Use low-call-overhead versions of select library routines. See the cc(1) or the f77(1) man page for more information. |
|
Request that the compiler parallelize only those loops with
|
|
Do not suppress optimizations that significantly increase
compile-time or consume large amounts of memory. See the
section " |
|
Enable directive-specified, node-level parallelism. See the section
" |
|
Optimize with the assumption that subprogram arguments do not refer to the same memory. When this option can be used, it allows the compiler to generate significantly faster code. See the cc(1) man page for more information. (Available only in C.) |
|
Display optimization reports. See the section
" |
|
(For use when linking with the compiler driver) Search the archive version of a library; if the archive version is not available, search the shared version of the library. |
|
(For use when linking with the compiler driver) Underflows are exceptions, by default; this option avoids exceptions so that underflows just generate zeros. |
+Onoexemplar_model is not also specified (See the section "+O[no]exemplar_-model" on page 18 for
more information.)