Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 7 Controlling optimization

Invoking command-line options

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

At each optimization level, you can turn specific optimizations on or off using the +O[no]optimization option. The optimization parameter is the name of a specific optimization. The optional prefix [no] disables the specified optimization.

The following sections describe the optimizations that are turned on or off, their defaults, and the optimization levels at which they may be used. In syntax descriptions, namelist represents a comma-separated list of names.

+O[no]aggressive

Optimization level: +O2, +O3, +O4

Default: +Onoaggressive

+O[no]aggressive enables or disables optimizations that can result in significant performance improvement, and can change a program's behavior. This includes the optimizations invoked by the following advanced options (these are discussed separately in this chapter):

  • + Osignedpointers (C and C++)

  • + Oentrysched

  • + Onofltacc

  • + Olibcalls

  • + Onoinitcheck

  • + Ovectorize

+ O[no]all

Optimization level: all

Default: +Onoall

Equivalent option: +Oall option is equivalent to specifying +O4 +Oaggressive +Onolimit

+Oall performs maximum optimization, including aggressive optimizations and optimizations that can significantly increase compile time and memory usage.

+O[no]autopar

Optimization level: +O3, +O4 (+Oparallel must also be specified)

Default: +Oautopar

When used with +Oparallel option, +Oautopar causes the compiler to automatically parallelize loops that are safe to parallelize. A loop is considered safe to parallelize if its iteration count can be determined at runtime before loop invocation. It must also contain no loop-carried dependences, procedure calls, or I/O operations.

A loop-carried dependence exists when one iteration of a loop assigns a value to an address that is referenced or assigned on another iteration.

When used with +Oparallel, the +Onoautopar option causes the compiler to parallelize only those loops marked by the loop_parallel or prefer_parallel directives or pragmas. Because the compiler does not automatically find parallel tasks or regions, user-specified task and region parallelization is not affected by this option.

C pragmas and Fortran directives are used to improve the effect of automatic optimizations and to assist the compiler in locating additional opportunities for parallelization. See “Optimization directives and pragmas” for more information.

+O[no]conservative

Optimization level: +O2, +O3, +O4

Default: +Onoconservative

Equivalent option: +Oconservative is equivalent to +Onoaggressive

+O[no]conservative causes the optimizer to make or not make conservative assumptions about the code when optimizing. +Oconservative is useful in assuming a particular program's coding style, such as whether it is standard-compliant. Specifying +Onoconservative disables any optimizations that assume
standard-compliant code.

+O[no]dataprefetch

Optimization level: +O2, +O3, +O4

Default: +Onodataprefetch

When +Odataprefetch is used, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. For cache lines containing data to be written, +Odataprefetch prefetches the cache lines so that they are valid for both read and write access. Data prefetch instructions are inserted only for data referenced within innermost loops using simple loop-varying addresses in a simple arithmetic progression. It is only available for
PA-RISC 2.0 targets.

The math library libm contains special prefetching versions of vector routines. If you have a PA-RISC 2.0 application containing operations on arrays larger than one megabyte in size, using +Ovectorize in conjunction with +Odataprefetch may substantially improve performance.

You can also use the +Odataprefetch option for applications that have high data cache miss overhead.

+O[no]dynsel

Optimization level: +O3, +O4 (+Oparallel must also be specified)

Default: +Odynsel

When specified with +Oparallel, +Odynsel enables workload-based dynamic selection. For parallelizable loops whose iteration counts are known at compile time, +Odynsel causes the compiler to generate either a parallel or a serial version of the loop—depending on which is more profitable.

This optimization also causes the compiler to generate both parallel and serial versions of parallelizable loops whose iteration counts are unknown at compile time. At runtime, the loop's workload is compared to parallelization overhead, and the parallel version is run only if it is profitable to do so.

The +Onodynsel option disables dynamic selection and tells the compiler that it is profitable to parallelize all parallelizable loops. The dynsel directive and pragma are used to enable dynamic selection for specific loops, when +Onodynsel is in effect. See the section “Dynamic selection” for additional information.

+O[no]entrysched

Optimization level: +O1, +O2, +O3, +O4

Default: +Onoentrysched

+Oentrysched optimizes instruction scheduling on a procedure's entry and exit sequences by unwinding in the entry and exit regions. Subsequently, this option is used to increase the speed of an application.

+O[no]entrysched can also change the behavior o f programs performing exception-handling or that handle asynchronous interrupts. The behavior of setjmp() and longjmp() is not affected.

+O[no]fail_safe

Optimization level: +O1, +O2, +O3, +O4

Default: +Ofail_safe

+Ofail_safe allows your compilations to continue when internal optimization errors are detected. When an error is encountered, this option issues a warning message and restarts the compilation at +O0. The +Ofail_safe option is disabled when you specify +Oparallel with +O3 or +O4 to compile with parallelization.

Using +Onofail_safe aborts your compilation when internal optimization errors are detected.

+O[no]fastaccess

Optimization level: +O0, +O1, +O2, +O3, +O4

Default: +Onofastaccess at +O0, +O1, +O2 and +O3;
+Ofastaccess at +O4

+Ofastaccess performs optimization for fast access to global data items. Use +Ofastaccess to improve execution speed at the expense of longer compile times.

+O[no]fltacc

Optimization level: +O2, +O3, +O4

Default: none (See Table 7-2 “+O[no]fltacc and floating-point optimizations”.)

+O[no]fltacc enables or disables optimizations that cause imprecise floating-point results.

+Ofltacc disables optimizations that cause imprecise floating-point results. Specifying +Ofltacc disables the generation of Fused
Multiply-Add (FMA) instructions, as well as other floating-point optimizations. Use +Ofltacc if it is important that the compiler evaluates floating-point expressions according to the order specified by the language standard.

+Onofltacc improves execution speed at the expense of floating-point precision. The +Onofltacc option allows the compiler to perform floating-point optimizations that are algebraically correct, but may result in numerical differences. These differences are generally insignificant. The +Onofltacc option also enables the optimizer to generate FMA instructions.

If you optimize code at +O2 or higher, and do not specify +Onofltacc or +Ofltacc, the optimizer uses FMA instructions. However, it does not perform floating-point optimizations that involve expression reordering. FMA is implemented by the PA-8x00 instructions FMPYFADD and FMPYNFADD and improves performance. Occasionally, these instructions may produce results that may differ in accuracy from results produced by code without FMA. In general, the differences are slight.

Table 7-2 “+O[no]fltacc and floating-point optimizations” presents a summary of the preceding information.

Table 7-2 +O[no]fltacc and floating-point optimizations

Option specified[1]FMA optimizationsOther floating-point optimizations
+OfltaccDisabledDisabled
+OnofltaccEnabledEnabled
neither option
is specified
EnabledDisabled

[1] +O[no]fltacc is only available at +O2 and above.

 

+O[no]global_ptrs_unique[=namelist]

Optimization level: +O2, +O3, +O4

Default: +Onoglobal_ptrs_unique

NOTE: This option is not available in Fortran or C++.

Using this C compiler option identifies unique global pointers so that the optimizer can generate more efficient code in the presence of unique pointers, such as using copy propagation and common subexpression elimination. A global pointer is unique if it does not alias with any variable in the entire program.

This option supports a comma-separated list of unique global pointer variable names, represented by namelist in +O[no]global_ptrs_unique[=namelist]. If namelist is not specified, using +O[no]global_ptrs_unique informs the compiler that all [no] global pointers are unique.

The example below states that no global pointers are unique, except a and b:

+Oglobal_ptrs_unique=a,b

The next example says that all global pointers are unique except a and b:

+Onoglobal_ptrs_unique=a,b

+O[no]info

Optimization level: +O0, +O1, +O2, +O3, +O4

Default: +Onoinfo

+Oinfo displays informational messages about the optimization process. This option is used at all optimization levels, but is most useful at +O3 and +O4. For more information about this option, see Chapter 8 “Optimization Report” Chapter 7 “Controlling optimization”.

+O[no]initcheck

Optimization level: +O2, +O3, +O4

Default: unspecified

+O[no]initcheck performs an initialization check for the optimizer. The optimizer has three possible states that check for initialization: on, off, or unspecified.

  • When on (+Oinitcheck), the optimizer initializes to zero any local, scalar, and nonstatic variables that are uninitialized with respect to at least one path leading to a use of the variable.

  • When off (+Onoinitcheck), the optimizer issues warning messages when it discovers definitely uninitialized variables , but does not initialize them.

  • When unspecified, the optimizer initializes to zero any local, scalar, nonstatic variables that are definitely uninitialized with respect to all paths leading to a use of the variable.

+O[no]inline[=namelist]

Optimization level: +O3, +O4

Default: +Oinline

When +Oinline is specified without a name list, any function can be inlined. For successful inlining, follow the prototype definitions for function calls in the appropriate header files.

When specified with a name list, the named functions are important candidates for inlining. For example, the following statement indicates that inlining be strongly considered for foo and bar:

+Oinline=foo,bar +Onoinline

All other routines are not considered for inlining because +Onoinline is given.

NOTE: The Fortran 90 and aC++ compilers accept only +O[no]inline. No namelist values are accepted.

Use the +Onoinline[=namelist] option to exercise precise control over which subprograms are inlined. Use of this option is guided by knowledge of the frequency with which certain routines are called and may be warranted by code size concerns.

When this option is disabled with a name list, the compiler does not consider the specified routines as candidates for inlining. For example, the following statement indicates that inlining should not be considered for baz and x:

+Onoinline=baz,x

All other routines are considered for inlining because +Oinline is the default.

+Oinline_budget=n

Optimization level: +O3, +O4

Default: +Oinline_budget=100

In +Oinline_budget=n, n is an integer in the range 1 to 1000000 that specifies the level of aggressiveness, as follows:

n = 100

Default level of inlining

n > 100

More aggressive inlining

The optimizer is less restricted by compilation time and code size when searching for eligible routines to inline

n = 1

Only inline if it reduces code size

The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option implies specifying +Oinline_budget=200. The +Osize option implies +Oinline_budget=1. However, +Oinline_budget takes precedence over both of these options. This means that you can override the effects on inlining of the +Onolimit and +Osize options, by specifying the +Oinline_budget option on the same compile line.

+O[no]libcalls

Optimization level: +O0, +O1, +O2, +O3, +O4

Default: +Onolibcalls at +O0 and +O1;
+Olibcalls at +O2, +O3, and +O4

+Olibcalls increases the runtime performance of code that calls standard library routines in simple contexts. The +Olibcalls option expands the following library calls inline:

  • strcpy()

  • sqrt()

  • fabs()

  • alloca()

Inlining takes place only if the function call follows the prototype definition in the appropriate header file. A single call to printf() may be replaced by a series of calls to putchar(). Calls to sprintf() and strlen() may be optimized more effectively, including elimination of some calls producin g unused results. Calls to setjmp() and longjmp() may be replaced by their equivalents _setjmp() and _longjmp(), which do not manipulate the process's signal mask.

Using the +Olibcalls option invokes millicode versions of frequently called math functions. Currently, there are millicode versions for the following functions:

acosasinatanatan2
cosexploglog10
powsintan 

See the HP-UX Floating-Point Guide for the most up-to-date listing of the math library functions.

+Olibcalls also improves the performance of selected library routines (when you are not performing error checking for these routines). The calling code must not expect to access ERRNO after the function's return.

Using +Olibcalls with +Ofltacc gives different floating-point calculation results than those given using +Olibcalls without +Ofltacc.

+O[no]limit

Optimization level: +O2, +O3, +O4

Default: +Olimit

The +Olimit option suppresses optimizations that significantly increase compile-time or that can consume a considerable amount of memory.

The +Onolimit option allows optimizations to be performed, regardless of their effects on compile-time and memory usage. Specifying the +Onolimit option implies specifying +Oinline_budget=200. See the section " +Oinline_budget=n" +Oinline_budget=n for more information.

+O[no]loop_block

Optimization level: +O3, +O4

Default: +Onoloop_block

+O[no]loop_block enables or disables blocking of eligible loops for improved cache performance. The +Onoloop_block option disables both automatic and directive-specified loop blocking. For more information on loop blocking, see the section “Loop blocking”.

+O[no]loop_transform

Optimization level: +O3, +O4

Default: +Oloop_transform

+O[no]loop_transform enables or disables transformation of eligible loops for improved cache performance. The most important transformation is the interchange of nested loops to make the inner loop unit stride, resulting in fewer cache misses.

The other transformations affected by +O[no]loop_transform are loop distribution, loop blocking, loop fusion, loop unroll, and loop unroll and jam. See Chapter 3 “Optimization levels” for information on loop transformations.

If you experience any problem while using +Oparallel, +Onoloop_transform may be a helpful option.

+O[no]loop_unroll[=unroll factor]

Optimization level: +O2, +O3, +O4

Default: +Oloop_unroll=4

+Oloop_unroll enables loop unrolling. When you use +Oloop_unroll, you can also suggest the unroll factor to control the code expansion. The default unroll factor is four, meaning that the loop body is replicated four times. By experimenting with different factors, you may improve the performance of your program. In some cases, the compiler uses its own unroll factor.

The +Onoloop_unroll option disables partial and complete unrolling. Loop unrolling improves efficiency by eliminating l oop overhead, and can create opportunities for other optimizations, such as improved register use and more efficient scheduling. See the section “Loop unrolling” for more information on unrolling.

+O[no]loop_unroll_jam

Optimization level: +O3, +O4

Default: +Onoloop_unroll_jam

The +O[no]loop_unroll_jam option enables or disables loop unrolling and jamming. The +Onoloop_unroll_jam option (the default) disables both automatic and directive-specified unroll and jam. Loop unrolling and jamming increases register exploitation. For more information on the unroll and jam optimization, see the section “Loop unroll and jam”.

+O[no]moveflops

Optimization level: +O2, +O3, +O4

Default: +Omoveflops

+O[no]moveflops allows or disallows moving conditional floating-point instructions out of loops. The behavior of floating-point exception handling may be altered by this option.

Use +Onomoveflops if floating-point traps are enabled and you do not want the behavior of floating-point exceptions to be altered by the relocation of floating-point instructions.

+O[no]multiprocessor

Optimization level: +O2, +O3, +O4

Default: +Onomultiprocesssor

Specifying the +Omultiprocessor option at +O2 and above tells the compiler to appropriately optimize several different processes on multiprocessor machines. The optimizations are those appropriate for executables and shared libraries.

Enabling this option incorrectly (such as on a uniprocessor machine) may cause performance problems.

Specifying +Onomultiprocessor (the default) disables the optimization of more than one process running on multiprocessor machines.

+O[no]parallel

Optimization level: +O3, +O4

Default: +Onoparallel

The +Onoparallel option is the default for all optimization levels. This option disables automatic and directive-specified parallelization.

If you compile one or more files in an application using +Oparallel, then the application must be linked (using the compiler driver) with the +Oparallel option to link in the proper start-up files and runtime support.

The +Oparallel option causes the compiler to:

  • Recognize the directives and pragmas that involve parallelism, such as begin_tasks, loop_parallel, and prefer_parallel

  • Look for opportunities for parallel execution in loops

The following methods are used to specify the number of processors used in executing your parallel programs:

  • loop_parallel(max_threads=m) directive and pragma

  • prefer_parallel(max_threads=m)directive and pragma

    For a description of these directives and pragmas, see Chapter 9 “Parallel programming techniques” and Chapter 12 “Parallel synchronization”. These pragmas are not available in the HP aC++ compiler.

  • MP_NUMBER_OF_THREADS environment variable, which is read at runtime by your program. If this variable is set to some positive integer n, your program executes on n processors. n must be less than or equal to the number of processors in the system where the program is executing.

The +Oparallel option is valid only at optimization level +O3 and above. For information on parallelization, see the section “Levels of parallelism”.

Using the +Oparallel option disables +Ofail_safe, which is enabled by default. See the section +O[no]fail_safe for more information.

+O[no]parmsoverlap

Optimization level: +O2, +O3, +O4

Default (Fortran): +Onoparmsoverlap

Default (C/C++): +Oparmsoverlap

+Oparmsoverlap causes the optimizer to assume that the actual arguments of function calls overlap in memory.

+O[no]pipeline

Optimization level: +O2, +O3, +O4

Default: +Opipeline

+O[no]pipeline enables or disables software pipelining. If program size is more important than execution speed, use +Onopipeline.

Software pipelining is particularly useful for loops containing arithmetic operations on REAL or REAL*8 variables in Fortran or on float or double variables in C and C++.

+O[no]procelim

Optimization level: +O0, +O1, +O2, +O3, +O4

Default: +Onoprocelim at +O0, +01, +O2, +O3;
+Oprocelim at +O4

When +Oprocelim is specified, procedures not referenced by the application are eliminated from the output executable file. The +Oprocelim option reduces the size of the executable file, especially when optimizing at +O3 and +O4, at which inlining may have removed all of the calls to some routines.

When +Onoprocelim is specified, procedures not referenced by the application are not eliminated from the output executable file.

If the +Oall option is enabled, the +Oprocelim option is enabled.

+O[no]ptrs_ansi

Optimization level: +O2, +O3, +O4

Default: +Onoptrs_ansi

The +Optrs_ansi option makes the following two assumptions, which the more aggressive +Optrs_strongly_typed does not:

  • int *p is assumed to point to an int field of a struct or union.

  • char * is assumed to point to any type of object.

NOTE: This option is not available in C++.

When both +Optrs_ansi and +Optrs_strongly_typed are specified, +Optrs_ansi takes precedence.

+O[no]ptrs_strongly_typed

Optimization level: +O2, +O3, +O4

Default: +Onoptrs_strongly_typed

Use the C compiler option +Optrs_strongly_typed when pointers are type-safe. The optimizer can use this information to generate more efficient code.

NOTE: This option is not available in C++.

Type-safe (strongly-typed) pointers point to a specific type that, in turn, only point to objects of that type. For example, a pointer declared as a pointer to an int is considered type-safe if that pointer points to an object of type int only.

Based on the type-safe concept, a set of groups are built based on object types. A given group includes all the objects of the same type.

In type-inferred aliasing, any pointer of a type in a given group (of objects of the same type) can only point to any object from the same group. It cannot point to a typed object from any other group.

Type casting to a different type violates type-inferring aliasing rules. Dynamic casting is, however, allowed, as shown in Example 41.

Data type interaction

The optimizer generally spills all global data from registers to memory before any modification to global variables or any loads through pointers. However, the optimizer can generate more efficient code if it knows how various data types interact.

Consider the following example (line numbers are provided for reference):

1  int *p;
2 float *q;
3 int a,b,c;
4 float d,e,f;
5 foo()
6 {
7 for (i=1;i<10;i++) {
8 d=e;
9 *p=...;
10 e=d+f;
11 f=*q;
12 }
13 }

With +Onoptrs_strongly_typed turned on, the pointers p and q are assumed to be disjoint because the types they point to are different types. Without type-inferred aliasing, *p is assumed to invalidate all the definitions. So, the use of d and f on line 10 have to be loaded from memory. With type-inferred aliasing, the optimizer can propagate the copy of d and f, thus avoiding two loads and two stores.

This option is used for any application involving the use of pointers, where those pointers are type safe. To specify when a subset of types are type-safe, use the ptrs_strongly_typed pragma. The compiler issues warnings for any incompatible pointer assignments that may violate the type-inferred aliasing rules discussed in the section “C aliasing options”.

Unsafe type cast

Any type cast to a different type violates type-inferred aliasing rules. Do not use +Optrs_strongly_typed with code that has these "unsafe" type casts. Use the no_ptrs_strongly_typed pragma to prevent the application of type-inferred aliasing to the unsafe type casts.

struct foo{
int a;
int b;
} *P; struct bar {
float a;
int b;
float c;
} *q; P = (struct foo *) q;
/* Incompatible pointer assignment
through type cast */

Generally applying type aliasing

Dynamic casting is allowed with +Optrs_strongly_typed or +Optrs_ansi. A pointer dereference is called a dynamic cast if a cast is applied on the pointer to a different type.

In the example below, type-inferred aliasing is generally applied on P, not just to the particular dereference. Type-aliasing is applied to any other dereferences of P.

     struct s {
short int a;
short int b;
int c;
} *P
* (int *)P = 0;

For more information about type aliasing, see the section “C aliasing options”.

+O[no]ptrs_to_globals[=namelist]

Optimization level: +O2, +O3, +O4

Default: +Optrs_to_globals

By default, global variables are conservatively assumed to be modified anywhere in the program. Use the C compiler option +Onoptrs_to_globals to specify which global variables are not modified through pointers. This allows the optimizer to make the program run more efficiently by incorporating copy propagation and common subexpression elimination.

NOTE: This option is not available in C++.

This option is used to specify all global variables that are not modified using pointers, or to specify a comma-separated list of global variables that are not modified using pointers.

The on state for this option disables some optimizations, such as aggressive optimizations on the program's global symbols.

For example, use the command-line option +Onoptrs_to_globals=a,b,c to specify global variables a, b, and c to not be accessible through pointers. The result (shown below) is that no pointer can access these global variables. The optimizer performs copy propagation and constant folding because storing to *p does not modify a or b.

int a, b, c;
float *p;
foo()
{
a = 10;
b = 20;
*p = 1.0;
c = a + b;
}

If all global variables are unique, use the +Onoptrs_to_globals option without listing the global variables (that is, without using namelist).

In the example below, the address of b is taken. This means b is accessed indirectly through the pointer. You can still use +Onoptrs_to_globals as:

+Onoptrs_to_globals +Optrs_to_globals=b. 
int b,c;
int *p
p=&b;
foo()

For more information about type aliasing, see the section “C aliasing options”.

+O[no]regreassoc

Optimization level: +O2, +O3, +O4

Default: +Oregreassoc

+O[no]regreassoc enables or disables register reassociation. This is a technique for folding and eliminating integer arithmetic operations within loops, especially those used for array address computations.

This optimization provides a code-improving transformation supplementing loop-invariant code motion and strength reduction. Additionally, when performed in conjunction with software pipelining, register reassociation can also yield significant performance improvement.

+O[no]report[=report_type]

Optimization level: +O3, +O4

Default: +Onoreport

+Oreport[=report_type] specifies the contents of the Optimization Report. Values of report_type and the Optimization Reports they produce are shown in Table 7-3 “Optimization Report contents”.

Table 7-3 Optimization Report contents

report_type valueReport contents
allLoop Report and Privatization Table
loopLoop Report
privateLoop Report and Privatization Table
report_type not given (default)Loop Report

 

The Loop Report gives information on optimizations performed on loops and calls. Using +Oreport (without =report_type) also produces the Loop Report.

The Privatization Table provides information on loop variables that are privatized by the compiler.

+Oreport[=report_type] is active only at +O3 and above. The +Onoreport option does not accept any of the report_type values. For more information about the Optimization Report, see Chapter 8 “Optimization Report”.

+Oinfo also displays information on the various optimizations being performed by the compilers. +Oinfo is used at any optimization level, but is most useful at +O3 and above. The default at all optimization levels is +Onoinfo.

+O[no]sharedgra

Optimization level: +O2, +O3, +O4

Default: +Osharedgra

The +Onosharedgra option disables global register allocation for shared-memory variables that are visible to multiple threads. This option may help if a variable shared among parallel threads is causing wrong answers. See the section “Global register allocation (GRA)” for more information.

Global register allocation (+Osharedgra) is enabled by default at optimization level +O2 and higher.

+O[no]signedpointers

Optimization level: +O2, +O3, +O4

Default: +Onosignedpointers

NOTE: This option is not available in the HP Fortran 90 compiler.

The C and C++ option +O[no]signedpointers requests that the compiler perform or not perform optimizations related to treating pointers as signed quantities. This helps improve application runtime speed. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled.

+O[no]size

Optimization level: +O2, +O3, +O4

Default: +Onosize

The +Osize option suppresses optimizations that significantly increase code size. Specifying +Osize implies specifying +Oinline_budget=1. See the section "+Oinline_budget=n" +Oinline_budget=n for additional information.

The +Onosize option does not prevent optimizations that can increase code size.

+O[no]static_prediction

Optimization level: +O0, +O1, +O2, +O3, +O4

Default: +Onostatic_prediction

+Ostatic_prediction turns on static branch prediction for
PA-RISC 2.0 targets. Use +Ostatic_prediction to better optimize large programs with poor instruction locality, such as operating system and database code.

PA-RISC 2.0 predicts the direction conditional branches go in one of two ways:

  • Dynamic branch prediction uses a hardware history mechanism to predict future executions of a branch from its last three executions. It is transparent and quite effective, unless the hardware buffers involved are overwhelmed by a large program with poor locality.

  • Static branch prediction, when enabled, predicts each branch based on implicit hints encoded in the branch instruction itself. The static branch prediction is responsible for handling large codes with poor locality for which the small dynamic hardware facility proves inadequate.

+O[no]vectorize

Optimization level: +O3, +O4

Default: +Onovectorize

+Ovectorize allows the compiler to replace certain loops with calls to vector routines. Use +Ovectorize to increase the execution speed of loops.

NOTE: This option is not available in the HP aC++ compiler.

When +Onovectorize is specified, loops are not replaced with calls to vector routines.

Because the +Ovectorize option may change the order of floating-point operations in an application, it may also change the results of those operations slightly. See the HP-UX Floating-Point Guide for more information.

The math library contains special prefetching versions of vector routines. If you have a PA2.0 application containing operations on large arrays (larger than 1 Megabyte in size), using +Ovectorize in conjunction with +Odataprefetch may improve performance.

+Ovectorize is also included as part of the +Oaggressive and +Oall options.

+O[no]volatile

Optimization level: +O1, +O2, +O3, +O4

Default: +Onovolatile

NOTE: This option is not available in the HP Fortran 90 compiler.

The C and C++ option +Ovolatile implies that memory references to global variables cannot be removed during optimization.

The +Onovolatile option indicates that all globals are not of volatile class. This means that references to global variables are removed during optimization.

Use this option to control the volatile semantics for all global variables.

+O[no]whole_program_mode

Optimization level: +O4

Default: +Onowhole_program_mode

Use +Owhole_program_mode to increase performance speed. This should be used only when you are certain that only the files compiled with +Owhole_program_mode directly access any globals that are defined in these files.

NOTE: This option is not available in the HP Fortran 90 or aC++ compilers.

+Owhole_program_mode enables the assertion that only the files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals.

When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures.

All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4, but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program.

The default, +Onowhole_program_mode, disables the assertion noted above.

+tm target

Optimization level: +O0, +O1, +O2, +O3, +O4

Default target value: corresponds to the machine on which you invoke the compiler.

This option specifies the target machine architecture for which compilation is to be performed. Using this option causes the compiler to perform architecture-specific optimizations.

target takes one of the following values:

  • K8000 to specify K-Class servers using PA-8000 processors

  • V2000 to specify V2000 servers

  • V2200 to specify V2200 servers

  • V2250 to specify V2250 servers

This option is valid at all optimization levels. The default target value corresponds to the machine on which you invoke the compiler.

Using the +tm target option implies +DA and +DS settings as described in Table 7-4 “+tm target and +DA/+DS. +DAarchitecture causes the compiler to generate code for the architecture specified by architecture. +DSmodel causes the compiler to use the instruction scheduler tuned to model. See the f90(1) man page, aCC(1) page, or the cc(1) man page for more information describing the +DA and +DS options.

Table 7-4 +tm target and +DA/+DS

target value specified+DAarchitecture implied+DSmodel implied
K80002.02.0
V20002.02.0
V22002.02.0
V22502.02.0

 

If you specify + DA or +DS on the compiler command line, your setting takes precedence over the setting implied by +tm target.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.