
This chapter describes:
Exemplar compilers recognize the options, directives, and pragmas that the standard HP compilers recognize. Extensions accepted by the Exemplar compilers, however, are not recognized by the standard HP compilers. The following sections describe these extensions. See Chapter 1, "Introduction," for an overview of the standard HP compiler options discussed below.
The options below are recognized in addition to those supported by the standard HP compilers or are available in the standard HP compilers but have been modified to behave differently in the Exemplar compilers.
-gThis option requests that the compiler generate debugging
information in the executable file that can be used by the CXdb
debugger (an optional product). See Chapter 5, "Debugging
and
profiling," for more information on CXdb.
NOTE: Debugging with theddeandxdbdebuggers is not supported with code compiled using the Exemplar compilers.
The -g
option is ignored at optimization levels greater than +O0.
Also, -g is ignored with -O
because it implies +O2.
-I8This option specifies that INTEGER and
LOGICAL variable
declarations with unspecified lengths are to occupy 8 bytes of
storage.
Also, this option transforms intrinsic function references that return default integer or logical values to return 8-byte values of the specified type.
+O[no]autoparWhen used with the +Oparallel option, +Oautopar (the
default) causes the compiler to automatically parallelize loops
that are safe to parallelize. (A loop is safe to parallelize if it has an
iteration count that can be determined at runtime before loop
invocation, and contains no loop-carried dependences, procedure
calls, or I/O operations. A loop-carried dependence exists when
one iteration of a loop assigns a value to an address that is
referenced or assigned on another iteration.) You can use Fortran
directives and C pragmas to improve on the automatic
optimizations and to assist the compiler in locating additional
opportunities for parallelization.
When used with +Oparallel, the
+Onoautopar option causes
the compiler to parallelize only those loops marked by the
loop_parallel or prefer_parallel directives or pragmas.
Because the compiler does not automatically find parallel tasks or
regions, user-specified task and region parallelization is not
affected by this option.
Because parallelization takes places only at +O3 and above,
+O[no]autopar is useful only at +O3 and above.
+O[no]dataprefetchThe +O[no]dataprefetch option enables [disables]
optimizations to generate data prefetch instructions for data
referenced within innermost loops. The effect is that the memory
system will retrieve the data for future iterations while the
processor is executing current iterations. For cache lines
containing data that will be written, +Odataprefetch
prefetches the cache lines so that they are valid for both read and
write access.
This option provides no benefit to loops whose data fits in the cache; in fact, it can slow them down because of the prefetch instructions. For loops whose data does not fit in the cache, the speedup can be substantial.
The +O[no]dataprefetch option is valid
at +O2 and above.
The default is +Onodataprefetch. This option is effective only
on S2000 and X2000 servers.
+O[no]dynselWhen specified with +Oparallel, +Odynsel (the default)
enables workload-based dynamic selection. For parallelizable
loops whose iteration counts are known at compile time,
+Odynsel causes the compiler to generate either a parallel or a
serial version of the loop--depending on which is more profitable.
This optimization also causes the compiler to generate both parallel and serial versions of parallelizable loops whose iteration counts are unknown at compile time. At runtime, the loop's workload is compared to parallelization overhead, and the parallel version is run only if it is profitable to do so.
The +Onodynsel option disables dynamic selection and tells the
compiler that it is profitable to parallelize all parallelizable loops.
The dynsel directive and pragma can be used to enable
dynamic selection for specific loops when +Onodynsel is in
effect.
+O[no]exemplar_model
+Oexemplar_model (the default) causes the compiler to
recognize the Exemplar programming model. This option allows
you to use the directives, pragmas, and associated command-line
options that make up the programming model. At lower
optimization levels (+O0, +O1,
+O2), this option enables only the
following components of the programming model:
At +O3 and +O4, using +Oexemplar_model enables all
directives, pragmas, storage class specifiers, and typedefs. See the
section "Exemplar compiler directives and pragmas" on page 23
for additional information.
The +Oexemplar_model option implies the
+Okernel_threads option.
The +Onoexemplar_model option turns off support for the
Exemplar programming model. If you use this option, directives
and pragmas from the Exemplar programming model are
ignored. +Onoexemplar_model can be used with either
+Okernel_threads (the default) or +Oprocess_threads.
+Okernel_threadsThe +Okernel_threads option causes the compiler to use a
thread-based model of parallelism. The Exemplar programming
model requires thread-based parallelism. This option is available
at all optimization levels and is enabled by default.
Alternatively, you can specify process-based parallelism by using
the +Oprocess_threads option. See the section
"+Oprocess_threads" on page 21
for more information.
+Okernel_threads can be used with either
+Oexemplar_model or +Onoexemplar_model.
+O[no]nodeparThe +Ononodepar option disables node-parallelism by causing
the compiler to generate code for a single-node machine. When
this option is used, serial code is generated for node-parallel
constructs. Specifying the +Ononodepar option prevents the
compiler from implementing node-parallelism, but allows the
implementation of both automatic and directive-specified
thread-parallelism.
The +Onodepar option causes the compiler to perform
node-parallelism where it has been specified using the nodes
attribute with the loop_parallel, prefer_parallel,
parallel, or begin_tasks directives or pragmas. Also, the
+Onodepar option causes the compiler to honor the
node_trip_count attribute to the dynsel
directive or pragma.
The +O[no]nodepar option is effective only when specified with
the +Oparallel option at +O3 and above. The default is
+Ononodepar.
+O[no]parallelThe +Oparallel option causes the compiler
to:
begin_tasks, loop_parallel, prefer_parallel, and
parallel. These directives and pragmas are not recognized
if +Onoexemplar_model is specified.There are three ways to specify the number of processors used in executing your parallel programs:
MP_NUMBER_OF_THREADS, which is used at runtime. If this
variable is set to some positive integer n, your program
executes on n processors; n must be less than or equal to the
number of processors in the subcomplex where the program
is executing. If MP_NUMBER_OF_THREADS is not set, your
program runs on the number of processors in the subcomplex
where it is executing. (See the section "Subcomplexes" on
page 71 for information on subcomplexes.)mpa
utility, which provides more control (than
MP_NUMBER_OF_THREADS) over the attributes in a parallel
program. See the section "Using the mpa utility"
on page 75
or the mpa(1) man page for more information.+min and +max linker
options. See the ld(1) man page for more information.The +Oparallel option is valid only at optimization level +O3
and above. Using the +Oparallel option disables
+Ofail_safe, which is on by default. See the section
"+O[no]fail_safe" on
page 6 for more information.
The +Onoparallel option is the
default for all optimization
levels. This option disables automatic and directive-specified
parallelization.
NOTE: If you compile one file in an application using+Oparallel, then you must link the application (using the compiler driver) with the+Oparalleloption to link in the proper start-up files and runtime support.
+Oprocess_threadsThe +Oprocess_threads option causes the compiler to use
process-based parallelism. Process-based parallelism is used by
the standard HP compilers.
+Oprocess_threads implies +Onoexemplar_model, which
causes directives and pragmas from the Exemplar programming
model to be ignored.
If you specify both +Oexemplar_model and
+Oprocess_threads, +Oprocess_threads is ignored with a
warning, and +Okernel_threads is selected.
+Okernel_threads is the default. See the section
"+Okernel_threads" on page 19
for more information.
+O[no]report[=
report_type]This option causes the compiler to display various optimization
reports. +Onoreport is the default. The value of report_type
determines which report is displayed, as described below.
+Oreport=loop produces the Loop Report. This report gives
information on optimizations performed on loops and calls. Using
+Oreport (without =report_type)
also produces the Loop Report.
+Oreport=private produces the Loop Report and the
Privatization Table, which provides information on loop variables
that are privatized by the compiler.
+Oreport=all produces all reports.
The +Oreport[=report_type] option
is active only at +O3 and
above. See the Exemplar Programming Guide for more information
on the optimization reports.
The option +Oinfo displays additional information on the
various optimizations being performed by the compilers. +Oinfo
can be used at any optimization level but is most useful at +O3 and
above. The default, at all optimization levels, is +Onoinfo.
+O[no]sharedgraThe +Onosharedgra option disables
global register allocation
for shared-memory variables that are visible to multiple threads.
This option can help if a variable shared among parallel threads is
causing wrong answers. See the Exemplar Programming Guide for
more information.
Global register allocation (+Osharedgra) is enabled by default at
optimization level +O2 and higher.
+paThe +pa option requests that the compiler add instrumentation
(additional information) to an executable file for the CXpa
performance analyzer to read. The +pa option is not valid with
the +O4 or +Oall optimization levels. Also,
+pa is not compatible
with the -p or -G options. See Chapter 5,
"Debugging and
profiling," for more information on CXpa.
+tm targetThis option specifies the target machine architecture for which compilation is to be performed. Using this option causes the compiler to perform architecture-specific optimizations. target takes one of the following values:
spp1200 to specify SPP1200 Series machines
spp1600 to specify SPP1600 Series machines
S2000 to specify S2000 servers
X2000 to specify X2000 serversThis option is valid at all optimization levels. The default target
value corresponds to the machine on which you invoke the
compiler. The +tm target option is automatically specified when
you use one of the Exemplar compiler drivers. If you are manually
linking your application, you have to specify the +tm target
option.
Using the +tm target option implies
+DA and +DS settings as
described in Table 5. +DA
architecture causes the compiler to
generate code for the architecture specified by architecture.
+DSmodel causes the compiler to use the
instruction scheduler
tuned to model. See the cc(1) man page or the f77(1) man page for
more information on the +DA and +DS options.
target value specified |
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
If you specify +DA or +DS on the
compiler command line, your
setting takes precedence over the setting implied by
+tm target.
This section presents an alphabetical list of the Fortran directives and C pragmas that make up the Exemplar programming model. The Exemplar compilers accept the directives and pragmas listed below in addition to those supported by the standard HP compilers.
This section is intended to provide a brief overview of the
available directives and pragmas. More specific information and
examples can be found in the Exemplar Programming Guide. The
Fortran directives not supported as C pragmas are expressed in C
as either storage class extensions (thread_private, etc.) or as
typedefs (gate_t, barrier_t, etc.)
in the spp_prog_model.h file.
The form of an Exemplar Fortran compiler directive is:
C$DIR directive-specification
The form of an Exemplar C pragma is:
#pragma _CNX directive-specification
where
directive-specification
For information on how to properly use these directives or pragmas, see the Exemplar Programming Guide.
Directive names are presented here in lowercase; they may be
specified in either case in both languages, but #pragma must
always appear in lowercase in C.
In the sections that follow, namelist represents a comma-delimited
list of names. These names can be variables, arrays, or COMMON
blocks. In the case of a COMMON block, its name must be enclosed
within slashes. The occurrence of a lowercase n or m is used to
indicate an integer constant. Occurrences of gate_var are for
variables that have been, or are being, defined as gates. Any
parameters that appear within square brackets ([ ]) are optional.
align_cti(namelist)This directive or pragma aligns the variables and arrays listed in
namelist on CTIcache boundaries. This allows for more efficient
data reuse.
A CTIcache is a partition of physical memory that exists on each hypernode and is used to store copies of global data fetched from other hypernodes. (A hypernode is a set of processors and physical memory organized as a symmetric multiprocessor, or SMP, running a single image of the operating system microkernel.)
The CTIcache is 64 bytes on SP1200 and SPP1600 systems. On X2000 servers, the CTIcache is 32 bytes. (S2000 servers do not use a CTIcache.) See the Exemplar Programming Guide for more information.
barrier(namelist)This Fortran directive denotes a list of variables, as given in
namelist, that are to be used as the synchronization variables for
the barrier routines. This does not imply any synchronization in
itself; it is simply defining the barrier variables. In C, barrier is
a typedef (barrier_t), rather than a pragma. For more
information, refer to the Exemplar Programming Guide.
begin_tasks[(attribute_list)]This directive or pragma defines the beginning of a section (or
sections; see next_task) of code that is to be executed as an
independent, parallel task. Each task is executed by a separate
thread. begin_tasks must have an accompanying end_tasks
in the same program unit.
The optional attribute_list can be any of the following legal combinations (m is an integer constant):
threads (default)
nodes
distordered
max_threads=m
threads, ordered
nodes, ordered
dist, ordered
threads, max_threads=m
nodes, max_threads=m
dist, max_threads=m
ordered, max_threads=m
threads, ordered, max_threads=m
nodes, ordered, max_threads=m
dist, ordered, max_threads=mAttributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.
Refer to the Exemplar Programming Guide for a complete discussion of parallel tasking.
block_loop[(block_factor=n)]This directive or pragma indicates a specific loop to block, and
optionally, the block factor n
(n must be an integer constant greater
than or equal to 2) that is to be used in the compiler's internal
computation of loop nest based data reuse. If no block_factor
is specified, the compiler uses a heuristic to determine the
block_factor. Refer to the Exemplar
Programming Guide for
more information on blocking.
block_shared(
allocatable_array_namelist)This Fortran directive is used to declare arrays as being of type
block_shared. Block-shared arrays are sized to be an integral
multiple of the page size. The pages of the array are distributed in
same-size blocks across the hypernodes on which the process is
executing in the subcomplex. If the user-specified size is not an
integral multiple of page size yen num_nodes(), then the compiler
automatically rounds it up to meet this criterion.
critical_section[(gate_var)]This directive or pragma defines the beginning of a code block in
which only one thread may be executing at a time. The end of the
code block must be indicated by an end_critical_section
directive or pragma, which must appear in the same flow of
control within the same program unit. The optional gate_var can
be used to differentiate between parallel tasks. Refer to the
Exemplar Programming Guide for more information.
dynsel[(trip_count
=n)]This directive or pragma enables workload-based dynamic
selection for the immediately following loop.
trip_count represents
either the thread_trip_count or node_trip_count
attribute, and n is an integer constant.
When thread_trip_count=n is specified,
the serial version of
the loop is run if the iteration count is less than n;
otherwise, the
thread-parallel version is run. When node_trip_count=n is
specified, the serial version of the loop is run if the iteration count
is less than n; otherwise, the node-parallel version is run,
assuming +Onodepar is specified.
end_critical_sectionThis directive or pragma defines the end of
the critical section that
was begun with the critical_section directive or pragma.
critical_section and end_critical_section must
appear as a pair. Refer to the Exemplar Programming Guide for more
information.
end_ordered_sectionThis directive or pragma defines the end of the ordered section
that was begun with the ordered_section directive or pragma.
ordered_section and end_ordered_section must appear
as a pair. Refer to the Exemplar Programming Guide for more
information on ordered sections.
end_parallelThis directive or pragma signifies the end of a parallel region. The
parallel directive signifies the beginning of a parallel region.
Refer to Chapter 4, "Basic shared-memory programming," in the
Exemplar Programming Guide for more information.
end_tasksThis directive or pragma terminates the specification of parallel
tasks indicated by begin_tasks and next_task. It must
appear at the end of the last section of parallel code defined by
these directives or pragmas. All of these must appear in the same
program unit. Refer to the Exemplar Programming Guide for more
information.
far_shared(namelist)This Fortran directive causes the compiler to place the data objects
in namelist (variables, arrays, or COMMON blocks) into
far_shared memory. far_shared memory is the most general
form that is distributed on a page basis across the memories of all
hypernodes in a subcomplex. The far_shared data objects of a
process are addressable by all threads of that process. In C,
far_shared is a storage class specifier. Refer to the
Exemplar Programming Guide for more information on memory classes.
far_shared_pointer(namelist)This Fortran directive causes the compiler to place the
(compiler-generated, hidden) pointers to the allocated objects
(specified in namelist) in far_shared memory, regardless of the
memory classes to which the respective objects are allocated.
This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.
gate(namelist)This Fortran directive defines a gate variable that is to be used
subsequently in a critical section, ordered section, or passed as an
argument to the synchronization intrinsics. In C, gate is a
typedef (gate_t), rather than a pragma. Refer to the Exemplar
Programming Guide for more information.
loop_parallel[(attribute_list)]This directive or pragma is an explicit instruction to the compiler
to parallelize the immediately following loop. The loop iterations
are run in an indeterminate order unless the optional ordered
attribute appears. You are responsible for any required data
privatization and loop synchronization. The optional
attribute_list can be any of the
following combinations (n and m are integer constants):
threads (default)nodes
dist
ordered
max_threads=m
chunk_size=n
threads, ordered
nodes, ordered
dist, ordered
threads, max_threads=m
nodes, max_threads=m<
LI>dist, max_threads=m
ordered, max_threads=m
threads, chunk_size=n
nodes, chunk_size=n
dist, chunk_size=n
threads, ordered, max_threads=m
nodes, ordered, max_threads=m
dist, ordered, max_threads=m
chunk_size=n, max_threads=m
threads, chunk_size=n, max_threads=m
nodes, chunk_size=n, max_threads=m
dist, chunk_size=n, max_threads=m
ivar = indvar
ivar = indvar is:
DO WHILE and hand-rolled
loops in FortranDO loops
Attributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.
Refer to the Exemplar Programming Guide for more information.
loop_private(namelist)This directive or pragma declares a list of variables
and/or arrays
private to the immediately following loop. No values may be
carried into the loop by loop_private variables. To be loop
private, the variables and/or arrays must be assigned before they
are used on each iteration of the immediately following loop.
These private data items are distinct from the shared items of the
same name that exist outside the loop. Values assigned to
loop_private variables on the final iteration (that is, the nth
iteration of a loop with n iterations) may be saved into the shared
variables of the same name if the save_last directive or pragma
also appears on this loop. If save_last is not used, then the
value of any shared variable declared to be loop_private is
undefined at loop termination. Refer to the Exemplar Programming
Guide for more information.
near_shared(namelist)
When applied to static variables at compile-time, this Fortran
directive causes all pages of the data objects in namelist to be
mapped to physical pages on logical hypernode 0 (the hypernode
where the program starts). If applied to allocatable arrays, then
the pages of such arrays will be mapped to physical pages on the
hypernode of the allocating thread. near_shared data can be
addressed by any thread of a process on any hypernode in the
subcomplex but it is "closer" (in terms of access latency) to the
threads on the hypernode that allocates the data. In C,
near_shared is a storage class specifier. Refer to the
Exemplar Programming Guide for more information on memory classes.
near_shared_pointer(namelist)This Fortran directive causes the compiler to place the
(compiler-generated, hidden) pointers to the allocated objects
(specified in namelist) in near_shared
memory, regardless of the
memory classes to which the respective objects are allocated.
This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.
next_taskThis directive or pragma starts a block of code following a
begin_tasks block that will be executed as a parallel task. The
end of the code block is marked by another next_task or by an
end_tasks directive or pragma.
This directive must appear within a begin_tasks and
end_tasks pair. There is no limit on the
number of next_task
directives that can appear. Refer to the Exemplar
Programming Guide for more information.
no_block_loopThis directive or pragma disables loop blocking on the immediately following loop. Refer to the Exemplar Programming Guide for more information on loop blocking.
no_distributeThis directive or pragma disables loop distribution for the
immediately following loop. Refer to the Exemplar
Programming Guide for more information on loop distribution.
no_dynselThis directive or pragma disables workload-based dynamic
selection for the immediately following loop. Refer to the
Exemplar Programming Guide for more information on dynamic
selection.
no_loop_dependence(namelist)This directive or pragma informs the compiler that the arrays in
namelist do not have any dependences for iterations of the
immediately following loop. Use no_loop_dependence for
arrays only; use loop_private to indicate dependence-free
scalar variables.
This directive or pragma causes the compiler to ignore any dependences that it perceives to exist. This can enhance the compiler's ability to optimize the loop, including the possibility of parallelization.
Refer to the Exemplar Programming Guide for more information.
no_loop_transformThis directive or pragma prevents the compiler from performing
reordering transformations on the following loop. The compiler
does not distribute, fuse, interchange, or parallelize a loop on
which this directive or pragma appears. Refer to the Exemplar
Programming Guide for more information.
no_parallelThis directive or pragma prevents the compiler from generating
parallel code for the immediately following loop. Refer to the
Exemplar Programming Guide for more
information.
no_side_effects(funclist)This directive or pragma informs the compiler that the functions
appearing in funclist have no side effects wherever they appear
lexically following the directive. Side effects include modifying a
function argument, modifying a Fortran COMMON variable,
performing I/O, or calling another routine that does any of the
above. The compiler can sometimes eliminate calls to procedures
that have no side effects; also, the compiler may be able to
parallelize loops with calls when informed that the called routines
do not have side effects.
node_private(namelist)
This Fortran directive causes the variables and arrays specified in
namelist to be replicated in the physical memory of each
hypernode on which the process is executing. Thus, while each
data object has a single image in virtual memory, it maps to a
different physical location on each hypernode. The threads of a
process within a hypernode all share access to the copy on their
hypernode and cannot access the copies on other hypernodes.
In C, node_private is a storage class specifier.
node_private_pointer(namelist)This Fortran directive causes the compiler to place the
(compiler-generated, hidden) pointers to the allocated objects
(specified in namelist) in node_private memory, regardless of
the memory classes to which the respective objects are allocated.
This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.
ordered_section(gate_var)This directive or pragma defines the beginning of an ordered
section. An ordered section is the same as a critical section (a code
block in which only one thread may be executing at a time) with
the additional restriction that the threads must pass through the
ordered section in iteration order. The end of the code block must
be indicated by an end_ordered_section directive or pragma.
Ordered sections must appear within the control flow of a
loop_parallel(ordered)directive. Refer to the Exemplar
Programming Guide for more information.
parallel[(attribute_list)]This directive or pragma signifies the beginning of a parallel
region of code. All code up to the following end_parallel
directive or pragma will be run on all available threads. No loop
transformations, data privatization, or parallelization analysis
will be performed by the compiler on the code in the region.
The optional attribute_list can be any of the following legal combinations (m is an integer constant):
threads (default)
nodes
max_threads=m
threads,max_threads=m
nodes,max_threads=mAttributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.
parallel_private(namelist)This directive or pragma declares a list of variables or arrays
private to the immediately following parallel region. It serves the
same purpose for parallel regions that task_private serves for
tasks. The privatized variables and arrays will not carry their
values beyond the end_parallel directive or pragma.
prefer_parallel[(attribute_list)]This directive or pragma instructs the compiler to parallelize the
following loop, but only if it is safe to do so. A loop is safe to
parallelize if it has an iteration count that can be determined at
runtime before loop invocation and contains no loop-carried
dependences, procedure calls, or I/O operations. (A loop-carried
dependence exists when one iteration of a loop assigns a value to
an address that is referenced or assigned on another iteration.)
Refer to the Exemplar Programming Guide for more information.
The optional attribute_list can be any of the following combinations (n and m are integer constants):
threads (default)
nodesdist
ordered
max_threads=m
chunk_size=n
threads, ordered
nodes, ordered
dist, ordered
threads, max_threads=m
nodes, max_threads=m
dist, max_threads=m
ordered, max_threads=m
threads, chunk_size=n
nodes, chunk_size=n
dist, chunk_size=n
threads, ordered, max_threads=m
nodes, ordered, max_threads=m
dist, ordered, max_threads=m
chunk_size=n, max_threads=m
threads, chunk_size=n, max_threads=m
nodes, chunk_size=n, max_threads=m
dist, chunk_size=n, max_threads=mAttributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.
save_last[(list)]This directive or pragma specifies that the variables in the
comma-delimited list that are also named in an associated
loop_private(namelist)
directive or pragma must have their
last values saved into the "shared" variable of the same name at
loop termination. (A variable's last value in a loop of n iterations
is its value that is generated in the nth iteration.)
If the optional list is not used, save_last specifies that all
variables named in an
associated loop_private(namelist)
directive or pragma must have their last values saved into the
"shared" variable of the same name at loop termination.
If save_last is not specified then the values in any privatized
variables or arrays are indeterminate at loop termination. Refer to
the Exemplar Programming Guide for more information.
scalarThis directive or pragma prevents the compiler from performing
reordering transformations on the following loop. The compiler
does not distribute, fuse, interchange, or parallelize a loop on
which this directive or pragma appears. The
no_loop_transform directive or pragma provides the same
functionality as the scalar directive or pragma and is
recommended in place of the scalar directive or pragma.
sync_routine(routinelist)This directive or pragma indicates to the compiler that the
routines listed in routinelist are user-defined synchronization
routines, so that the compiler does not attempt to move code
across these routine calls. Use sync_routine anytime you hide
a call to a compiler synchronization function inside another
routine call, or anytime you use CPSlib functions for
synchronization. (CPSlib is a library of low-level parallelization
and synchronization routines. See the Exemplar
Programming Guide for more information.)
sync_routine is effective only for the
listed routines in the file
in which it appears.
task_private(namelist)This directive or pragma privatizes the variables and arrays
specified in namelist for each task
specified in the immediately
following begin_tasks/end_tasks block. If a
task_private data object is referenced within a task, it must
have been assigned a value previously in that task. The privatized
variables and arrays do not carry their values beyond the
end_tasks directive or pragma. Refer to the Exemplar
Programming Guide for more information.
thread_private(namelist)This Fortran directive causes the variables and arrays specified in
namelist to be treated as being thread_private.
thread_private data objects map to unique node_private
addresses for each thread of a process. In C, thread_private is
a storage class specifier. Refer to the Exemplar Programming Guide
for more information.
thread_private_pointer(namelist)This Fortran directive causes the compiler to place the
(compiler-generated, hidden) pointers to the allocated objects
(specified in namelist) in thread_private memory, regardless of
the memory classes to which the respective objects are allocated.
This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.
This section describes the extensions that are supported in the Exemplar Fortran 77 compiler. See the HP FORTRAN/9000 Programmer's Reference for information on the extensions available in the standard HP Fortran 77 compiler.
INTEGER*8The INTEGER*8 data type allocates storage for 8-byte integer
data.
INTEGER*8 constantsYou can specify an INTEGER*8 constant by adding the K suffix
after the constant value. Using the K suffix is the only way to
specify an INTEGER*8 constant; the command-line option -I8
does not imply INTEGER*8 constants.
LOGICAL*8The LOGICAL*8 date type allocates storage for 8-byte logical data.
TASK COMMON
Exemplar Fortran supports Cray TASK COMMON blocks. A
program should already be running multiple threads before
calling a subroutine that contains a TASK COMMON block.
Variables in a TASK COMMON block are stored in a thread-private
COMMON block (each thread has its own thread-local copy of the
TASK COMMON block).
The TASK COMMON statement creates these blocks and has the
form:
TASK COMMON /cbn/nlist[,/cbn/nlist]...
where
cbn
TASK COMMON block. Unnamed
TASK COMMON blocks are not allowed.is a list of variable names, array names, and array declarators.
These variables cannot appear in a DATA statement, but
otherwise can be used like any variables in COMMON storage.
All occurrences of the TASK COMMON block must be declared
TASK COMMON; a COMMON block cannot be declared
both COMMON
and TASK COMMON. TASK COMMON blocks can be declared only in
functions, subprograms and BLOCK DATA subprograms.
Using TASK COMMON is the same as using a COMMON
block that is
specified in the namelist of a
thread_private(namelist)
directive.
Table 6 describes the intrinsics in Exemplar Fortran 77 that
support INTEGER*8 data.
Entry point |
Description |
Specific intrinsic |
|---|---|---|
|
Bit test of an integer value |
|
FTN_KQNINT |
Nearest integer |
INTEGER(8) function KIQNNT(A) REAL(16) :: A |
|
Absolute value of
|
INTEGER(8) function KISIGN(A,B) INTEGER(8) :: A,B |
|
Zero extend |
INTEGER(8) function KZEXT(A) LOGICAL(8) :: A |
|
Clear a bit to zero |
INTEGER(8) function KIBCLR(I,POS) INTEGER(8) :: I,POS |
|
Extract a sequence of bits |
INTEGER(8) function KIBITS(I,POS,LEN) INTEGER(8) :: I,POS,LEN |
|
Set a bit to one |
INTEGER(8) function KIBSET(I,POS) INTEGER(8) :: I,POS |
|
Logical shift |
INTEGER(8) function KISHFT(I,SHIFT) INTEGER(8) :: I,SHIFT |
|
Circular shift of rightmost bits |
INTEGER(8) function KISHFTC(I,SHIFT,SIZE) INTEGER(8) :: I,SHIFT INTEGER(8),OPTIONAL :: SIZE |
|
Integer absolute value |
INTEGER(8) function KIABS(A) INTEGER(8) :: A |
|
Positive difference |
INTEGER(8) function KIDIM(X,Y) INTEGER(8) :: X,Y |
KIDNINT |
Nearest integer |
INTEGER(8) function KIDNNT(A) DOUBLE PRECISION:: A |
KININT |
Nearest integer |
INTEGER(8) function KNINT(A) REAL :: A |
|
Remainder function |
INTEGER(8) function KMOD(A,P) INTEGER(8) :: A,P |
|
Copy a sequence of bits from one data object to another |
subroutine KMVBITS(FROM,FROMPOS,LEN,TO,TOPOS) INTEGER(8):: FROM,TO INTEGER :: FROMPOS, TOPOS, LEN |
Fortran 77's EQUIVALENCE statement allows you to associate
variables so that they share the same storage space. In the
standard HP Fortran 77 compiler, equivalences are placed in static
storage. In the Exemplar Fortran 77 compiler, however,
equivalences are stored on the stack because the
-Wc,-local_equivs option is used by default.
The items listed in this section are predefined and have special meanings.
NOTE: "__" indicates two adjacent underscore characters. There is no
space between these characters. If a space is added, the compiler
does not recognize the variable as a predefined symbol.__HP_CXD_SPP=1
This symbol (which has two leading underscores) is always
defined when using the Exemplar compilers. The
preprocessor (cpp) predefines this symbol so that code can be
conditionalized based on whether a file is being compiled
using the Exemplar compilers.
_REENTRANT=1
This symbol (which has one leading underscore) is predefined
for use by the include files. When it is predefined, reentrant
versions of libc routines are called. When _REENTRANT is not
predefined, some libc routines that are not reentrant are
called. Calling a non-reentrant routine from within a parallel
region is an error.
The compiler predefines this symbol if +Oparallel is
specified with either +O3 or +O4.
The SPP-UX operating system and the Exemplar compilers support large files. A large file is a file that is greater than 2^31 - 1 bytes in size (approximately 2 gigabytes).
Several SPP-UX utilities have been modified to function properly on large files. See the largefiles(1m) man page for information on the modified utilities and on compiler support for large files.
This section discusses the various methods you can use to create a parallel executable. A parallel executable does not necessarily execute in parallel; however, when a parallel executable is run, the proper parallel environment is always set up--regardless of whether the executable runs serially or in parallel.
There are three ways to create a parallel executable (an executable in which the parallel flag is set):
+Oparallel compiler option at +O3
and above+min, +max,
+tnode, +over, or
+parallel)
mpa (-min, -max,
-over, or -parallel)Applications that rely strictly on the message-passing model to achieve parallelism do not need the parallel flag set. Message-passing applications that use multilevel parallelism do, however, need the parallel flag set. For information on developing parallel applications that use message-passing, see the HP MPI Users' Guide or the HP PVM User's Guide.
See the section "Using the file utility"
on page 74 for
information on determining if a file is a parallel executable.
+Oparallel compiler option Using the +Oparallel option at +O3 and above allows the
compiler to automatically parallelize loops that are profitable to
parallelize. Also, because +Oexemplar_model is on by default,
the compiler recognizes the parallelism-related directives and
pragmas of the Exemplar programming model.
The Exemplar compilers find parallelism at the loop level and generate parallel code that will automatically run on as many processors as are available at runtime. Normally, these are all the processors of the subcomplex on which your program is running--unless you specify a smaller number of processors.
Automatic parallelization is useful for programs containing loops. You can use compiler directives or pragmas to improve on the automatic optimizations and to assist the compiler in locating additional opportunities for parallelization.
For more information on using the +Oparallel option, refer to
the section "+O[no]parallel" on page 20 or to the
Exemplar Programming Guide.
Specifying the +parallel linker option sets the parallel flag in
the ESOM auxiliary header. (See the section "SOM vs. ESOM" on
page 62 for information on SOM and ESOM files.) When linking
using the compiler driver, if the +Oparallel compiler option is
used (at +O3 or above), the +parallel option is automatically
passed to the linker.
If any of the object files being linked are already parallel, you do
not need to specify +parallel. Also, if you already specified
+min n, +max n
(n is the number of processors to use), +tnode m
(m is the maximum number of threads to allocate per hypernode)
or +over, you do not need to specify
+parallel. See the ld(1)
man page for more information.
mpa utilityThe mpa (modify program attributes) utility allows you to set the
parallel flag in the ESOM auxiliary header using the -parallel
option. It also allows you to set the number of processors used
when executing a parallel program (using -min n or
-max n,
where n is the number of processors to use). For information on
other features, refer to the section "Using the mpa utility"
on
page 75 or to the mpa(1) man page.