Exemplar C and Fortran 77 Programmer's Guide: Exemplar extensions

Exemplar C and Fortran 77 Programmer's Guide

Exemplar extensions

[ Previous Page ] [ Next Page ] [ Contents ]


Last modified on: Wednesday, January 22 1997 at 10:08am

This chapter describes:

Exemplar compilers recognize the options, directives, and pragmas that the standard HP compilers recognize. Extensions accepted by the Exemplar compilers, however, are not recognized by the standard HP compilers. The following sections describe these extensions. See Chapter 1, "Introduction," for an overview of the standard HP compiler options discussed below.


Exemplar compiler options

The options below are recognized in addition to those supported by the standard HP compilers or are available in the standard HP compilers but have been modified to behave differently in the Exemplar compilers.

-g

This option requests that the compiler generate debugging information in the executable file that can be used by the CXdb debugger (an optional product). See Chapter 5, "Debugging and profiling," for more information on CXdb.

NOTE: Debugging with the dde and xdb debuggers is not supported with code compiled using the Exemplar compilers.

The -g option is ignored at optimization levels greater than +O0. Also, -g is ignored with -O because it implies +O2.

-I8

This option specifies that INTEGER and LOGICAL variable declarations with unspecified lengths are to occupy 8 bytes of storage.

Also, this option transforms intrinsic function references that return default integer or logical values to return 8-byte values of the specified type.

+O[no]autopar

When used with the +Oparallel option, +Oautopar (the default) causes the compiler to automatically parallelize loops that are safe to parallelize. (A loop is safe to parallelize if it has an iteration count that can be determined at runtime before loop invocation, and contains no loop-carried dependences, procedure calls, or I/O operations. A loop-carried dependence exists when one iteration of a loop assigns a value to an address that is referenced or assigned on another iteration.) You can use Fortran directives and C pragmas to improve on the automatic optimizations and to assist the compiler in locating additional opportunities for parallelization.

When used with +Oparallel, the +Onoautopar option causes the compiler to parallelize only those loops marked by the loop_parallel or prefer_parallel directives or pragmas. Because the compiler does not automatically find parallel tasks or regions, user-specified task and region parallelization is not affected by this option.

Because parallelization takes places only at +O3 and above, +O[no]autopar is useful only at +O3 and above.

+O[no]dataprefetch

The +O[no]dataprefetch option enables [disables] optimizations to generate data prefetch instructions for data referenced within innermost loops. The effect is that the memory system will retrieve the data for future iterations while the processor is executing current iterations. For cache lines containing data that will be written, +Odataprefetch prefetches the cache lines so that they are valid for both read and write access.

This option provides no benefit to loops whose data fits in the cache; in fact, it can slow them down because of the prefetch instructions. For loops whose data does not fit in the cache, the speedup can be substantial.

The +O[no]dataprefetch option is valid at +O2 and above. The default is +Onodataprefetch. This option is effective only on S2000 and X2000 servers.

+O[no]dynsel

When specified with +Oparallel, +Odynsel (the default) enables workload-based dynamic selection. For parallelizable loops whose iteration counts are known at compile time, +Odynsel causes the compiler to generate either a parallel or a serial version of the loop--depending on which is more profitable.

This optimization also causes the compiler to generate both parallel and serial versions of parallelizable loops whose iteration counts are unknown at compile time. At runtime, the loop's workload is compared to parallelization overhead, and the parallel version is run only if it is profitable to do so.

The +Onodynsel option disables dynamic selection and tells the compiler that it is profitable to parallelize all parallelizable loops. The dynsel directive and pragma can be used to enable dynamic selection for specific loops when +Onodynsel is in effect.

+O[no]exemplar_model

+Oexemplar_model (the default) causes the compiler to recognize the Exemplar programming model. This option allows you to use the directives, pragmas, and associated command-line options that make up the programming model. At lower optimization levels (+O0, +O1, +O2), this option enables only the following components of the programming model:

At +O3 and +O4, using +Oexemplar_model enables all directives, pragmas, storage class specifiers, and typedefs. See the section "Exemplar compiler directives and pragmas" on page 23 for additional information.

The +Oexemplar_model option implies the +Okernel_threads option.

The +Onoexemplar_model option turns off support for the Exemplar programming model. If you use this option, directives and pragmas from the Exemplar programming model are ignored. +Onoexemplar_model can be used with either +Okernel_threads (the default) or +Oprocess_threads.

+Okernel_threads

The +Okernel_threads option causes the compiler to use a thread-based model of parallelism. The Exemplar programming model requires thread-based parallelism. This option is available at all optimization levels and is enabled by default.

Alternatively, you can specify process-based parallelism by using the +Oprocess_threads option. See the section "+Oprocess_threads" on page 21 for more information.

+Okernel_threads can be used with either +Oexemplar_model or +Onoexemplar_model.

+O[no]nodepar

The +Ononodepar option disables node-parallelism by causing the compiler to generate code for a single-node machine. When this option is used, serial code is generated for node-parallel constructs. Specifying the +Ononodepar option prevents the compiler from implementing node-parallelism, but allows the implementation of both automatic and directive-specified thread-parallelism.

The +Onodepar option causes the compiler to perform node-parallelism where it has been specified using the nodes attribute with the loop_parallel, prefer_parallel, parallel, or begin_tasks directives or pragmas. Also, the +Onodepar option causes the compiler to honor the node_trip_count attribute to the dynsel directive or pragma.

The +O[no]nodepar option is effective only when specified with the +Oparallel option at +O3 and above. The default is +Ononodepar.

+O[no]parallel

The +Oparallel option causes the compiler to:

There are three ways to specify the number of processors used in executing your parallel programs:

The +Oparallel option is valid only at optimization level +O3 and above. Using the +Oparallel option disables +Ofail_safe, which is on by default. See the section "+O[no]fail_safe" on page 6 for more information.

The +Onoparallel option is the default for all optimization levels. This option disables automatic and directive-specified parallelization.

NOTE: If you compile one file in an application using +Oparallel, then you must link the application (using the compiler driver) with the +Oparallel option to link in the proper start-up files and runtime support.

+Oprocess_threads

The +Oprocess_threads option causes the compiler to use process-based parallelism. Process-based parallelism is used by the standard HP compilers.

+Oprocess_threads implies +Onoexemplar_model, which causes directives and pragmas from the Exemplar programming model to be ignored.

If you specify both +Oexemplar_model and +Oprocess_threads, +Oprocess_threads is ignored with a warning, and +Okernel_threads is selected.

+Okernel_threads is the default. See the section "+Okernel_threads" on page 19 for more information.

+O[no]report[= report_type]

This option causes the compiler to display various optimization reports. +Onoreport is the default. The value of report_type determines which report is displayed, as described below.

+Oreport=loop produces the Loop Report. This report gives information on optimizations performed on loops and calls. Using +Oreport (without =report_type) also produces the Loop Report.

+Oreport=private produces the Loop Report and the Privatization Table, which provides information on loop variables that are privatized by the compiler.

+Oreport=all produces all reports.

The +Oreport[=report_type] option is active only at +O3 and above. See the Exemplar Programming Guide for more information on the optimization reports.

The option +Oinfo displays additional information on the various optimizations being performed by the compilers. +Oinfo can be used at any optimization level but is most useful at +O3 and above. The default, at all optimization levels, is +Onoinfo.

+O[no]sharedgra

The +Onosharedgra option disables global register allocation for shared-memory variables that are visible to multiple threads. This option can help if a variable shared among parallel threads is causing wrong answers. See the Exemplar Programming Guide for more information.

Global register allocation (+Osharedgra) is enabled by default at optimization level +O2 and higher.

+pa

The +pa option requests that the compiler add instrumentation (additional information) to an executable file for the CXpa performance analyzer to read. The +pa option is not valid with the +O4 or +Oall optimization levels. Also, +pa is not compatible with the -p or -G options. See Chapter 5, "Debugging and profiling," for more information on CXpa.

+tm target

This option specifies the target machine architecture for which compilation is to be performed. Using this option causes the compiler to perform architecture-specific optimizations. target takes one of the following values:

This option is valid at all optimization levels. The default target value corresponds to the machine on which you invoke the compiler. The +tm target option is automatically specified when you use one of the Exemplar compiler drivers. If you are manually linking your application, you have to specify the +tm target option.

Using the +tm target option implies +DA and +DS settings as described in Table 5. +DA architecture causes the compiler to generate code for the architecture specified by architecture. +DSmodel causes the compiler to use the instruction scheduler tuned to model. See the cc(1) man page or the f77(1) man page for more information on the +DA and +DS options.

Table 5 +tm target and +DA/+DS

target value specified

+DAarchitecture implied

+DSmodel implied

spp1200

1.1

1.1

spp1600

1.1

1.1

S2000

2.0

2.0

X2000

2.0

2.0

If you specify +DA or +DS on the compiler command line, your setting takes precedence over the setting implied by +tm target.


Exemplar compiler directives and pragmas

This section presents an alphabetical list of the Fortran directives and C pragmas that make up the Exemplar programming model. The Exemplar compilers accept the directives and pragmas listed below in addition to those supported by the standard HP compilers.

This section is intended to provide a brief overview of the available directives and pragmas. More specific information and examples can be found in the Exemplar Programming Guide. The Fortran directives not supported as C pragmas are expressed in C as either storage class extensions (thread_private, etc.) or as typedefs (gate_t, barrier_t, etc.) in the spp_prog_model.h file.

The form of an Exemplar Fortran compiler directive is:

C$DIR directive-specification

The form of an Exemplar C pragma is:

#pragma _CNX directive-specification

where

directive-specification

is one of the directives/pragmas described in this chapter

For information on how to properly use these directives or pragmas, see the Exemplar Programming Guide.

Directive names are presented here in lowercase; they may be specified in either case in both languages, but #pragma must always appear in lowercase in C.

In the sections that follow, namelist represents a comma-delimited list of names. These names can be variables, arrays, or COMMON blocks. In the case of a COMMON block, its name must be enclosed within slashes. The occurrence of a lowercase n or m is used to indicate an integer constant. Occurrences of gate_var are for variables that have been, or are being, defined as gates. Any parameters that appear within square brackets ([ ]) are optional.

align_cti(namelist)

This directive or pragma aligns the variables and arrays listed in namelist on CTIcache boundaries. This allows for more efficient data reuse.

A CTIcache is a partition of physical memory that exists on each hypernode and is used to store copies of global data fetched from other hypernodes. (A hypernode is a set of processors and physical memory organized as a symmetric multiprocessor, or SMP, running a single image of the operating system microkernel.)

The CTIcache is 64 bytes on SP1200 and SPP1600 systems. On X2000 servers, the CTIcache is 32 bytes. (S2000 servers do not use a CTIcache.) See the Exemplar Programming Guide for more information.

barrier(namelist)

This Fortran directive denotes a list of variables, as given in namelist, that are to be used as the synchronization variables for the barrier routines. This does not imply any synchronization in itself; it is simply defining the barrier variables. In C, barrier is a typedef (barrier_t), rather than a pragma. For more information, refer to the Exemplar Programming Guide.

begin_tasks[(attribute_list)]

This directive or pragma defines the beginning of a section (or sections; see next_task) of code that is to be executed as an independent, parallel task. Each task is executed by a separate thread. begin_tasks must have an accompanying end_tasks in the same program unit.

The optional attribute_list can be any of the following legal combinations (m is an integer constant):

Attributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.

Refer to the Exemplar Programming Guide for a complete discussion of parallel tasking.

block_loop[(block_factor=n)]

This directive or pragma indicates a specific loop to block, and optionally, the block factor n (n must be an integer constant greater than or equal to 2) that is to be used in the compiler's internal computation of loop nest based data reuse. If no block_factor is specified, the compiler uses a heuristic to determine the block_factor. Refer to the Exemplar Programming Guide for more information on blocking.

block_shared( allocatable_array_namelist)

This Fortran directive is used to declare arrays as being of type block_shared. Block-shared arrays are sized to be an integral multiple of the page size. The pages of the array are distributed in same-size blocks across the hypernodes on which the process is executing in the subcomplex. If the user-specified size is not an integral multiple of page size yen num_nodes(), then the compiler automatically rounds it up to meet this criterion.

critical_section[(gate_var)]

This directive or pragma defines the beginning of a code block in which only one thread may be executing at a time. The end of the code block must be indicated by an end_critical_section directive or pragma, which must appear in the same flow of control within the same program unit. The optional gate_var can be used to differentiate between parallel tasks. Refer to the Exemplar Programming Guide for more information.

dynsel[(trip_count =n)]

This directive or pragma enables workload-based dynamic selection for the immediately following loop. trip_count represents either the thread_trip_count or node_trip_count attribute, and n is an integer constant.

When thread_trip_count=n is specified, the serial version of the loop is run if the iteration count is less than n; otherwise, the thread-parallel version is run. When node_trip_count=n is specified, the serial version of the loop is run if the iteration count is less than n; otherwise, the node-parallel version is run, assuming +Onodepar is specified.

end_critical_section

This directive or pragma defines the end of the critical section that was begun with the critical_section directive or pragma. critical_section and end_critical_section must appear as a pair. Refer to the Exemplar Programming Guide for more information.

end_ordered_section

This directive or pragma defines the end of the ordered section that was begun with the ordered_section directive or pragma. ordered_section and end_ordered_section must appear as a pair. Refer to the Exemplar Programming Guide for more information on ordered sections.

end_parallel

This directive or pragma signifies the end of a parallel region. The parallel directive signifies the beginning of a parallel region. Refer to Chapter 4, "Basic shared-memory programming," in the Exemplar Programming Guide for more information.

end_tasks

This directive or pragma terminates the specification of parallel tasks indicated by begin_tasks and next_task. It must appear at the end of the last section of parallel code defined by these directives or pragmas. All of these must appear in the same program unit. Refer to the Exemplar Programming Guide for more information.

far_shared(namelist)

This Fortran directive causes the compiler to place the data objects in namelist (variables, arrays, or COMMON blocks) into far_shared memory. far_shared memory is the most general form that is distributed on a page basis across the memories of all hypernodes in a subcomplex. The far_shared data objects of a process are addressable by all threads of that process. In C, far_shared is a storage class specifier. Refer to the Exemplar Programming Guide for more information on memory classes.

far_shared_pointer(namelist)

This Fortran directive causes the compiler to place the (compiler-generated, hidden) pointers to the allocated objects (specified in namelist) in far_shared memory, regardless of the memory classes to which the respective objects are allocated.

This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.

gate(namelist)

This Fortran directive defines a gate variable that is to be used subsequently in a critical section, ordered section, or passed as an argument to the synchronization intrinsics. In C, gate is a typedef (gate_t), rather than a pragma. Refer to the Exemplar Programming Guide for more information.

loop_parallel[(attribute_list)]

This directive or pragma is an explicit instruction to the compiler to parallelize the immediately following loop. The loop iterations are run in an indeterminate order unless the optional ordered attribute appears. You are responsible for any required data privatization and loop synchronization. The optional attribute_list can be any of the following combinations (n and m are integer constants):

ivar = indvar is:

Attributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.

Refer to the Exemplar Programming Guide for more information.

loop_private(namelist)

This directive or pragma declares a list of variables and/or arrays private to the immediately following loop. No values may be carried into the loop by loop_private variables. To be loop private, the variables and/or arrays must be assigned before they are used on each iteration of the immediately following loop. These private data items are distinct from the shared items of the same name that exist outside the loop. Values assigned to loop_private variables on the final iteration (that is, the nth iteration of a loop with n iterations) may be saved into the shared variables of the same name if the save_last directive or pragma also appears on this loop. If save_last is not used, then the value of any shared variable declared to be loop_private is undefined at loop termination. Refer to the Exemplar Programming Guide for more information.

near_shared(namelist)

When applied to static variables at compile-time, this Fortran directive causes all pages of the data objects in namelist to be mapped to physical pages on logical hypernode 0 (the hypernode where the program starts). If applied to allocatable arrays, then the pages of such arrays will be mapped to physical pages on the hypernode of the allocating thread. near_shared data can be addressed by any thread of a process on any hypernode in the subcomplex but it is "closer" (in terms of access latency) to the threads on the hypernode that allocates the data. In C, near_shared is a storage class specifier. Refer to the Exemplar Programming Guide for more information on memory classes.

near_shared_pointer(namelist)

This Fortran directive causes the compiler to place the (compiler-generated, hidden) pointers to the allocated objects (specified in namelist) in near_shared memory, regardless of the memory classes to which the respective objects are allocated.

This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.

next_task

This directive or pragma starts a block of code following a begin_tasks block that will be executed as a parallel task. The end of the code block is marked by another next_task or by an end_tasks directive or pragma.

This directive must appear within a begin_tasks and end_tasks pair. There is no limit on the number of next_task directives that can appear. Refer to the Exemplar Programming Guide for more information.

no_block_loop

This directive or pragma disables loop blocking on the immediately following loop. Refer to the Exemplar Programming Guide for more information on loop blocking.

no_distribute

This directive or pragma disables loop distribution for the immediately following loop. Refer to the Exemplar Programming Guide for more information on loop distribution.

no_dynsel

This directive or pragma disables workload-based dynamic selection for the immediately following loop. Refer to the Exemplar Programming Guide for more information on dynamic selection.

no_loop_dependence(namelist)

This directive or pragma informs the compiler that the arrays in namelist do not have any dependences for iterations of the immediately following loop. Use no_loop_dependence for arrays only; use loop_private to indicate dependence-free scalar variables.

This directive or pragma causes the compiler to ignore any dependences that it perceives to exist. This can enhance the compiler's ability to optimize the loop, including the possibility of parallelization.

Refer to the Exemplar Programming Guide for more information.

no_loop_transform

This directive or pragma prevents the compiler from performing
reordering transformations on the following loop. The compiler does not distribute, fuse, interchange, or parallelize a loop on which this directive or pragma appears. Refer to the Exemplar Programming Guide for more information.

no_parallel

This directive or pragma prevents the compiler from generating parallel code for the immediately following loop. Refer to the Exemplar Programming Guide for more information.

no_side_effects(funclist)

This directive or pragma informs the compiler that the functions appearing in funclist have no side effects wherever they appear lexically following the directive. Side effects include modifying a function argument, modifying a Fortran COMMON variable, performing I/O, or calling another routine that does any of the above. The compiler can sometimes eliminate calls to procedures that have no side effects; also, the compiler may be able to parallelize loops with calls when informed that the called routines do not have side effects.

node_private(namelist)

This Fortran directive causes the variables and arrays specified in namelist to be replicated in the physical memory of each hypernode on which the process is executing. Thus, while each data object has a single image in virtual memory, it maps to a different physical location on each hypernode. The threads of a process within a hypernode all share access to the copy on their hypernode and cannot access the copies on other hypernodes. In C, node_private is a storage class specifier.

node_private_pointer(namelist)

This Fortran directive causes the compiler to place the (compiler-generated, hidden) pointers to the allocated objects (specified in namelist) in node_private memory, regardless of the memory classes to which the respective objects are allocated.

This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.

ordered_section(gate_var)

This directive or pragma defines the beginning of an ordered section. An ordered section is the same as a critical section (a code block in which only one thread may be executing at a time) with the additional restriction that the threads must pass through the ordered section in iteration order. The end of the code block must be indicated by an end_ordered_section directive or pragma. Ordered sections must appear within the control flow of a loop_parallel(ordered)directive. Refer to the Exemplar Programming Guide for more information.

parallel[(attribute_list)]

This directive or pragma signifies the beginning of a parallel region of code. All code up to the following end_parallel directive or pragma will be run on all available threads. No loop transformations, data privatization, or parallelization analysis will be performed by the compiler on the code in the region.

The optional attribute_list can be any of the following legal combinations (m is an integer constant):

Attributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.

parallel_private(namelist)

This directive or pragma declares a list of variables or arrays private to the immediately following parallel region. It serves the same purpose for parallel regions that task_private serves for tasks. The privatized variables and arrays will not carry their values beyond the end_parallel directive or pragma.

prefer_parallel[(attribute_list)]

This directive or pragma instructs the compiler to parallelize the following loop, but only if it is safe to do so. A loop is safe to parallelize if it has an iteration count that can be determined at runtime before loop invocation and contains no loop-carried dependences, procedure calls, or I/O operations. (A loop-carried dependence exists when one iteration of a loop assigns a value to an address that is referenced or assigned on another iteration.) Refer to the Exemplar Programming Guide for more information.

The optional attribute_list can be any of the following combinations (n and m are integer constants):

Attributes may be listed in any order. The compilers flag any attribute combinations other than those listed above with a warning and ignore the directive.

save_last[(list)]

This directive or pragma specifies that the variables in the comma-delimited list that are also named in an associated loop_private(namelist) directive or pragma must have their last values saved into the "shared" variable of the same name at loop termination. (A variable's last value in a loop of n iterations is its value that is generated in the nth iteration.)

If the optional list is not used, save_last specifies that all variables named in an associated loop_private(namelist) directive or pragma must have their last values saved into the "shared" variable of the same name at loop termination.

If save_last is not specified then the values in any privatized variables or arrays are indeterminate at loop termination. Refer to the Exemplar Programming Guide for more information.

scalar

This directive or pragma prevents the compiler from performing reordering transformations on the following loop. The compiler does not distribute, fuse, interchange, or parallelize a loop on which this directive or pragma appears. The no_loop_transform directive or pragma provides the same functionality as the scalar directive or pragma and is recommended in place of the scalar directive or pragma.

sync_routine(routinelist)

This directive or pragma indicates to the compiler that the routines listed in routinelist are user-defined synchronization routines, so that the compiler does not attempt to move code across these routine calls. Use sync_routine anytime you hide a call to a compiler synchronization function inside another routine call, or anytime you use CPSlib functions for synchronization. (CPSlib is a library of low-level parallelization and synchronization routines. See the Exemplar Programming Guide for more information.)

sync_routine is effective only for the listed routines in the file in which it appears.

task_private(namelist)

This directive or pragma privatizes the variables and arrays specified in namelist for each task specified in the immediately following begin_tasks/end_tasks block. If a task_private data object is referenced within a task, it must have been assigned a value previously in that task. The privatized variables and arrays do not carry their values beyond the end_tasks directive or pragma. Refer to the Exemplar Programming Guide for more information.

thread_private(namelist)

This Fortran directive causes the variables and arrays specified in namelist to be treated as being thread_private. thread_private data objects map to unique node_private addresses for each thread of a process. In C, thread_private is a storage class specifier. Refer to the Exemplar Programming Guide for more information.

thread_private_pointer(namelist)

This Fortran directive causes the compiler to place the (compiler-generated, hidden) pointers to the allocated objects (specified in namelist) in thread_private memory, regardless of the memory classes to which the respective objects are allocated.

This directive applies only to Fortran 90-style allocatable data objects used in HP Fortran 77 programs.


Exemplar Fortran 77 language extensions

This section describes the extensions that are supported in the Exemplar Fortran 77 compiler. See the HP FORTRAN/9000 Programmer's Reference for information on the extensions available in the standard HP Fortran 77 compiler.

INTEGER*8

The INTEGER*8 data type allocates storage for 8-byte integer data.

INTEGER*8 constants

You can specify an INTEGER*8 constant by adding the K suffix after the constant value. Using the K suffix is the only way to specify an INTEGER*8 constant; the command-line option -I8 does not imply INTEGER*8 constants.

LOGICAL*8

The LOGICAL*8 date type allocates storage for 8-byte logical data.

TASK COMMON

Exemplar Fortran supports Cray TASK COMMON blocks. A program should already be running multiple threads before calling a subroutine that contains a TASK COMMON block.

Variables in a TASK COMMON block are stored in a thread-private COMMON block (each thread has its own thread-local copy of the TASK COMMON block).

The TASK COMMON statement creates these blocks and has the form:

TASK COMMON /cbn/nlist[,/cbn/nlist]...

where

cbn

is a symbolic name for a TASK COMMON block. Unnamed TASK COMMON blocks are not allowed.
nlist

is a list of variable names, array names, and array declarators. These variables cannot appear in a DATA statement, but otherwise can be used like any variables in COMMON storage.

All occurrences of the TASK COMMON block must be declared TASK COMMON; a COMMON block cannot be declared both COMMON and TASK COMMON. TASK COMMON blocks can be declared only in functions, subprograms and BLOCK DATA subprograms.

Using TASK COMMON is the same as using a COMMON block that is specified in the namelist of a thread_private(namelist) directive.


Exemplar Fortran 77 intrinsics

Table 6 describes the intrinsics in Exemplar Fortran 77 that support INTEGER*8 data.

Table 6 Intrinsic functions

Entry point

Description

Specific intrinsic

BTEST_8

Bit test of an integer value

  LOGICAL(8) function BKTEST(I,POS)
      INTEGER(8) :: I,POS

FTN_KQNINT

Nearest integer

  INTEGER(8) function KIQNNT(A)
      REAL(16) :: A

FTN_KSIGN

Absolute value of A times B

  INTEGER(8) function KISIGN(A,B)
      INTEGER(8) :: A,B

FTN_KZEXT_B1

Zero extend

  INTEGER(8) function KZEXT(A)
      LOGICAL(8) :: A

IBCLR_8

Clear a bit to zero

  INTEGER(8) function KIBCLR(I,POS)
      INTEGER(8) :: I,POS

IBITS_8

Extract a sequence of bits

  INTEGER(8) function KIBITS(I,POS,LEN)
      INTEGER(8) :: I,POS,LEN

IBSET_8

Set a bit to one

  INTEGER(8) function KIBSET(I,POS)
      INTEGER(8) :: I,POS

ISHFT_8

Logical shift

  INTEGER(8) function KISHFT(I,SHIFT)
      INTEGER(8) :: I,SHIFT

ISHFTC_8

Circular shift of rightmost bits

  INTEGER(8) function KISHFTC(I,SHIFT,SIZE)
      INTEGER(8) :: I,SHIFT
      INTEGER(8),OPTIONAL :: SIZE

KABS

Integer absolute value

  INTEGER(8) function KIABS(A)
      INTEGER(8) :: A

KDIM

Positive difference

  INTEGER(8) function KIDIM(X,Y)
      INTEGER(8) :: X,Y

KIDNINT

Nearest integer

  INTEGER(8) function KIDNNT(A)
      DOUBLE PRECISION:: A

KININT

Nearest integer

  INTEGER(8) function KNINT(A)
      REAL :: A

KMOD

Remainder function

  INTEGER(8) function KMOD(A,P)
      INTEGER(8) :: A,P

MVBITS_8

Copy a sequence of bits from one data object to another

  subroutine KMVBITS(FROM,FROMPOS,LEN,TO,TOPOS)
      INTEGER(8):: FROM,TO
      INTEGER :: FROMPOS, TOPOS, LEN


Exemplar Fortran 77 equivalences

Fortran 77's EQUIVALENCE statement allows you to associate variables so that they share the same storage space. In the standard HP Fortran 77 compiler, equivalences are placed in static storage. In the Exemplar Fortran 77 compiler, however, equivalences are stored on the stack because the -Wc,-local_equivs option is used by default.


Predefined symbols

The items listed in this section are predefined and have special meanings.

NOTE: "__" indicates two adjacent underscore characters. There is no space between these characters. If a space is added, the compiler does not recognize the variable as a predefined symbol.
__HP_CXD_SPP=1

This symbol (which has two leading underscores) is always defined when using the Exemplar compilers. The preprocessor (cpp) predefines this symbol so that code can be conditionalized based on whether a file is being compiled using the Exemplar compilers.

_REENTRANT=1

This symbol (which has one leading underscore) is predefined for use by the include files. When it is predefined, reentrant versions of libc routines are called. When _REENTRANT is not predefined, some libc routines that are not reentrant are called. Calling a non-reentrant routine from within a parallel region is an error.

The compiler predefines this symbol if +Oparallel is specified with either +O3 or +O4.


Large files support

The SPP-UX operating system and the Exemplar compilers support large files. A large file is a file that is greater than 2^31 - 1 bytes in size (approximately 2 gigabytes).

Several SPP-UX utilities have been modified to function properly on large files. See the largefiles(1m) man page for information on the modified utilities and on compiler support for large files.


Thread-based parallelism

This section discusses the various methods you can use to create a parallel executable. A parallel executable does not necessarily execute in parallel; however, when a parallel executable is run, the proper parallel environment is always set up--regardless of whether the executable runs serially or in parallel.

There are three ways to create a parallel executable (an executable in which the parallel flag is set):

Applications that rely strictly on the message-passing model to achieve parallelism do not need the parallel flag set. Message-passing applications that use multilevel parallelism do, however, need the parallel flag set. For information on developing parallel applications that use message-passing, see the HP MPI Users' Guide or the HP PVM User's Guide.

See the section "Using the file utility" on page 74 for information on determining if a file is a parallel executable.

Using the +Oparallel compiler option

Using the +Oparallel option at +O3 and above allows the compiler to automatically parallelize loops that are profitable to parallelize. Also, because +Oexemplar_model is on by default, the compiler recognizes the parallelism-related directives and pragmas of the Exemplar programming model.

The Exemplar compilers find parallelism at the loop level and generate parallel code that will automatically run on as many processors as are available at runtime. Normally, these are all the processors of the subcomplex on which your program is running--unless you specify a smaller number of processors.

Automatic parallelization is useful for programs containing loops. You can use compiler directives or pragmas to improve on the automatic optimizations and to assist the compiler in locating additional opportunities for parallelization.

For more information on using the +Oparallel option, refer to the section "+O[no]parallel" on page 20 or to the Exemplar Programming Guide.

Using linker options

Specifying the +parallel linker option sets the parallel flag in the ESOM auxiliary header. (See the section "SOM vs. ESOM" on page 62 for information on SOM and ESOM files.) When linking using the compiler driver, if the +Oparallel compiler option is used (at +O3 or above), the +parallel option is automatically passed to the linker.

If any of the object files being linked are already parallel, you do not need to specify +parallel. Also, if you already specified +min n, +max n (n is the number of processors to use), +tnode m (m is the maximum number of threads to allocate per hypernode) or +over, you do not need to specify +parallel. See the ld(1) man page for more information.

Using the mpa utility

The mpa (modify program attributes) utility allows you to set the parallel flag in the ESOM auxiliary header using the -parallel option. It also allows you to set the number of processors used when executing a parallel program (using -min n or -max n, where n is the number of processors to use). For information on other features, refer to the section "Using the mpa utility" on page 75 or to the mpa(1) man page.


[ Previous Page ] [ Next Page ] [ Contents ]