Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Systems: HP aC++/HP C Programmer's Guide > Chapter 2 Command-Line Options

Code Optimizing Options

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Optimization options can be used to improve the execution speed of programs compiled with the HP compiler.

To use optimization, first specify the appropriate basic optimization level (+O1, +O2, +O3, or +O4) on the command line followed by one or more finer or more precise options when necessary.

For more information and examples, refer to Chapter 7 “Optimizing HP aC++ Programs”.

This section discusses the following topics:

Basic Optimization Level Options

The following options allow you specify the basic level of optimization:

-O

-O

The -O option invokes the optimizer to perform level 2 optimization. This option is equivalent to +O2 option.

Example:

This command compiles prog.C and optimizes at level 2.

aCC -O prog.C

+O0

+O0

Use +O0 for fastest compile time or with simple programs. No optimizations are performed.

Example:

This command compiles prog.C and optimizes at level 0.

aCC +O0 prog.C

+O1

+O1

The +O1 option performs level 1 optimization only. This includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole optimization. This is the default optimization level.

Example:

This command compiles prog.C and optimizes at level 1.

aCC +O1 prog.C

+O2

+O2

The +O2 option performs level 2 optimization. This includes level 1 optimizations plus optimizations performed over entire functions in a single file.

Example:

This command compiles prog.C and optimizes at level 2.

aCC +O2 prog.C

+O3

+O3

The +O3 option performs level 3 optimization. This includes level 2 optimizations plus full optimization across all subprograms within a single file.

Example:

This command compiles prog.C and optimizes at level 3.

aCC +O3 prog.C

+O4

+O4

The +O4 option performs level 4 optimization. This includes level 3 optimizations plus full optimizations across the entire application program. In the absence of +Oprofile=use, the compiler will emit a warning and the optimization level will drop to +O3. Also the defaults which depend on optimization will be the defaults for +O3.

When you link a program, the compiler brings all modules that were compiled at optimization level 4 into virtual memory at the same time. Depending on the size and number of the modules, compiling at +O4 can consume a large amount of virtual memory. If you are linking a large program that was compiled with the +O4 option, you may notice a system slow down. In the worst case, you may see an error indicating that you have run out of memory.

Example:

This command compiles prog.C and optimizes at level 4.

aCC +O4 prog.C

If you run out of memory when compiling at +O4 optimization, there are several things you can do:

  • Compile at +O4 only those modules that need to be compiled at optimization level 4, and compile the remaining modules at a lower level.

  • If you still run out of memory, increase the per-process data size limit. Run the System Administrator Manager (SAM) to increase the maxdsiz process parameter from 64 MB to 128 MB. This procedure provides the process with additional data space.

  • If increasing the per-process data size limit does not solve the problem, increase the system swap space.

    Refer to the System Administration Tasks manual for more information.

Object Files Generated at Optimization Level 4

Object files generated by the compiler at optimization level 4, called intermediate object files, are intended to be temporary files. These object files contain an intermediate representation of the user code in a format that is designed for advanced optimizations. Therefore, Hewlett-Packard reserves the right to change the format of these files without prior notice. There is no guarantee that intermediate object files will be compatible from one revision of the compiler to the next. The compiler will issue an error message and terminate when an incompatible intermediate file is generated.

Additional Optimization Options for Finer Control

Following are the additional optimizations options for finer control:

+cond_rodata

+cond_rodata

This option allows more data to be placed in a read-only section. Normally, data with initializers that contain relocations are not placed in read-only data sections. This option enables the linker to compute the proper section for initialized constant data.

NOTE: This option requires the linker patch, PHSS_31849. This option is available in HP aC++ compiler version A.06.* only.

+ES[no]lit

+ES[no]lit

The +ES[no]lit option places [does not place] string literals and const-qualified variables that do not require load-time or run-time initialization in the read-only data section. This is same as using +Olit option.

This option is deprecated and may not be supported in future releases. Instead you can use +Olit=all for +ESlit and +Olit=none for +ESnolit options.

-ipo

The -ipo option enables interprocedural optimizations across files. The object file produced using this option contains intermediate code (IELF file). At link time, ld automatically invokes the interprocedural optimizer (u2comp), if any of the input object files is an IELF file.

For optimization levels +O0 and +O1, this option is silently ignored.

The -ipo option will get implicitly invoked with the +O4 and +Ofaster options to match current behavior (+O4 ==> +O3 -ipo). This option is incompatible with debugging options. This restriction will be removed in future.

+[no]nrv

+[no]nrv

-Wc,-nrv_optimization,[off|on]

or

The +[no]nrv option enables [disables] the named return value (NRV) optimization. By default it is disabled.

The NRV optimization eliminates a copy-constructor call by allocating a local object of a function directly in the caller’s context if that object is always returned by the function.

Example:

 struct A{
A(A const&); //copy-constructor
};

A f(A constA x) {
A a(x);
return a; // Will not call the copy constructor if the
} // optimization is enabled.

This optimization will not be performed if the copy-constructor was not declared by the programmer. Note that although this optimization is allowed by the ISO/ANSI C++ standard, it may have noticeable side effects.

Example:

aCC -Wc,-nrv_optimization,on app.C

+O[no]clone

+O[no]clone

Cloning is controlled by a list-free option +O[no]clone analogous to +O[no]inline. It is on by default with +O3 and +O4, and can be disabled.

The +O[no]clone option influences cloning both in to and out of the functions it governs.

Example:

In the following examples, +Onoclone applies to the function foo, and directs that foo itself should not be cloned and that calls from foo (bar) should not be redirected to clones.

$ cc -c +Oprofile=use +O4 foo.c +Onoclone

$ cc -c +Oprofile=use +O4 bar.c

+O[no]failsafe

+O[no]failsafe

The +O[no]failsafe option enables [disables] failsafe optimization. When a compilation fails at the current optimization level +Ofailsafe will automatically restart the compilation at +O2 (for specific high level optimizer errors +O3/+O4) or +O0.

The default is +Ofailsafe.

+O[no]all

+O[no]all

Use the +Oall option to obtain the best possible performance. This option should be used with stable, well-structured code. These optimizations give you the fastest code, but are riskier than the default optimizations.

You can use +Oall at optimization levels 2, 3, and 4. The default is +Onoall.

This option is deprecated and may not be supported in future releases. Instead you can use +Ofaster. +O4 +Onolimit +Oaggressive is approximately equivalent to +Oall.

+O[no]aggressive

+O[no]aggressive

The +Oaggressive option enables aggressive optimizations. The +Onoaggressive option disables aggressive optimizations.

By default, aggressive optimizations are turned off. The +Oaggressive option is approximately equivalent to +Osignedpointers +Olibcalls +Onoinitcheck +Ofltacc=relaxed.

NOTE: This option is deprecated and may not be supported in future releases. Instead you can use +Ofast option.

+O[no]conservative

+O[no]conservative

The +O[no]conservative option is deprecated and may not be used in future releases. It is approximately equivalent to +Oparmsoverlap +Onomoveflops.

The default is +Onoconservative.

+O[no]limit

+O[no]limit

The +Olimit option enables optimizations that significantly increase compile time or that consume a lot of memory.

The +Onolimit option suppresses optimizations regardless of their effect on compile time or memory consumption.

Use +Onolimit at all optimization levels.

Usage:

+O[no]limit=level

The defined values of level are:

default

Based on tuning heuristics, the optimizer will spend a reasonable amount of time processing large procedures. This is the default option.

min

For large procedures, the optimizer will avoid non-linear time optimizations. This option is a synonym for +Olimit.

none

The optimizer will fully optimize large procedures, possibly resulting in significantly increased compile time. This option is a synonym for +Onolimit.

Example:

To remove optimization time restrictions at the second, third, or fourth optimization levels, use +Onolimit as follows:

aCC <opt level> +Onolimit sourcefile.C

+O[no]ptrs_ansi

+O[no]ptrs_ansi

The default is +Onoptrs_ansi

+Optrs_ansi is synonymous to +Otype_safety=ansi.

+Onoptrs_ansi is synonymous to +Otype_safety=off.

NOTE: This option is supported in aC++ C-mode only. A warning is displayed in C++ when this option is used.

+O[no]ptrs_strongly_typed

+O[no]ptrs_strongly_typed

The default is +Onoptrs_strongly_typed.

+Optrs_strongly_typed is synonymous to +Otype_safety=strong.

+Onoptrs_strongly_typed is synonymous to +Otype_safety=off.

NOTE: This option is supported in aC++ C-mode only. A warning is displayed in C++ when this option is used.

+O[no]ptrs_to_globals(list)

+O[no]ptrs_to_globals(list)

The +O[no]ptrs_to_globals option tells the optimizer whether global variables are accessed [are not accessed] through pointers. The default is +Onoptrs_to_globals.

+O[no]size

+O[no]size

While most optimizations reduce code size, the +Osize option suppresses those few optimizations that significantly increase code size. The +Onosize option enables code-expanding optimizations.

Use +Osize at all optimization levels. The default is +Onosize.

Advanced +Ooptimization Options

Advanced optimization options provide additional control for special situations.

+O[no]cross_region_addressing

+O[no]cross_region_addressing

The +O[no]cross_region_addressing option enables [disables] the use of cross-region addressing. Cross-region addressing is required if a pointer, such as an array base, points to a different region than the data being addressed due to an offset that results in a cross-over into another region. Standard conforming applications do not require the use of cross-region addressing.

The default is +Onocross_region_addressing.

NOTE: Using this option may result in reduced runtime performance.

+O[no]datalayout

+O[no]datalayout

The +O[no]datalayout option enables [disables] profile-driven layout of global and static data items to improve cache memory utilization. This option is currently enabled if +Oprofile=use (dynamic profile feedback) is specified.

The default, in the absence of +Oprofile=use, is +Onodatalayout.

+O[no]dataprefetch

+O[no]dataprefetch

When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions are inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression).

Use this option for applications that have high data cache miss overhead. The default is +Onodataprefetch.

+Odataprefetch is equivalent to +Odataprefetch=indirect. +Onodataprefetch is equivalent to +Odataprefetch=none.

Usage:

+Odataprefetch=kind

The defined values for kind are:

direct

Enable generation of data prefetch instructions for the benefit of direct memory accesses, but not indirect memory accesses.

indirect

Enables the generation of data prefetch instructions for the benefit of both direct and indirect memory accesses. This is the default at optimization levels +O2 and above.

none

Disables the generation of data prefetch instructions. This is the default at optimization levels +O1 and below.

+O[no]extern

+O[no]extern

Use the +O[no]extern option at optimization levels 0, 1, 2, 3, or 4. The default is +Oextern with no name list.

+Oextern is equivalent to -Bextern.

+Onoextern is equivalent to -Bprotected.

NOTE: This option is deprecated and may not be supported in the future releases.

+O[no]fltacc

+O[no]fltacc=level

The +O[no]fltacc option disables [enables] floating-point optimizations that can result in numerical differences. Any option other than +Ofltacc=strict also generates Fused Multiply-Add (FMA) instructions. FMA instructions can improve performance of floating-point applications.

If you specify neither +Ofltacc nor +Onofltacc, less optimization is performed than for +Onofltacc. If you specify neither option, the optimizer generates FMA instructions but does not perform any expression-reordering optimizations.

Specifying +Ofltacc insures the same result as in unoptimized code (+O0).

Usage:

+Ofltacc=level

The defined values for level are:

default

Allows contractions, such as fused multiply- add (FMA), but disallows any other floating-point optimization that can result in numerical differences.

limited

Like default, but also allows floating-point optimizations which may affect the generation and propagation of infinities, NaNs, and the sign of zero.

relaxed

In addition to the optimizations allowed by limited, permits optimizations, such as reordering of expressions, even if parenthesized, that may affect rounding error. This is the same as +Onofltacc.

strict

Disallows any floating-point optimization that can result in numerical differences. This is the same as +Ofltacc.

All options except +Ofltacc=strict option allow the compiler to make transformations which are algebraically correct, but which may slightly affect the result of computations due to the inherent imperfection of computer floating-point arithmetic. For many programs, the results obtained with these options are adequately similar to those obtained without the optimization.

For applications in which round-off error has been carefully studied, and the order of computation carefully crafted to control error, these options may be unsatisfactory. To insure the same result as in unoptimized code, use +Ofltacc.

Example:

All the options, except +Ofltacc=strict, allow the compiler to replace a division by a multiplication using the reciprocal. For example, the following code:

for (int j=1;j<5;j++)
   a[j] = b[j] / x;

is transformed as follows (note that x is invariant in the loop):

x_inv = 1.0/x;
for (int j=1;j<5;j++)
   a[j] = b[j] * x_inv;

Since multiplication is considerably faster than division, the optimized program runs faster.

+Ofrequently_called

+Ofrequently_called=function1[,function2...]

The named functions are assumed to be frequently called. This option overrides any information in a profile database.

+Ofrequently_called:filename

The file indicated by filename contains a list of functions, separated by spaces or newlines. These functions are assumed to be frequently called. This option overrides any information in a profile database.

+O[no]initcheck

+O[no]initcheck

The initialization checking feature of the optimizer has three possible states: on, off, or unspecified. When on (+Oinitcheck), the optimizer initializes to zero any local, non-static variables that are uninitialized with respect to at least one path leading to a use of the variable. When off (+Onoinitcheck), the optimizer does not initialize uninitialized variables, but issues warning messages when it discovers them.

When unspecified, the optimizer initializes to zero any local, non-static variables that are definitely uninitialized with respect to all paths leading to a use of the variable.

Use +Oinitcheck at optimization level 2 or above.

+O[no]inline

+O[no]inline

The +Oinline option indicates that any function can be inlined by the optimizer. +Onoinline disables inlining of functions by the optimizer. This option does not affect functions inlined at the source code level.

Use +Onoinline at optimization levels 2, 3 and4.

The default is +Oinline at optimization levels 3 and 4.

Usage:

+O[no]inline=function1{,function2...]

Enables [disables] optimizer inlining for the named functions.

+O[no]inline:filename

The file indicated by filename should contain a list of function names, separated by commas or newlines. Optimization is enabled [disabled] for the named functions.

+Oinlinebudget

+Oinlinebudget=n

The +Oinlinebudget option controls the compile time budget for the inliner. A lower number causes the inliner to consider fewer candidates for inlining, while a higher number leads it to consider more candidates. The inlining candidates are ordered in priority order based on the inliner’s heuristics, so this does not affect the most important candidates.

The +Oinlinebudget option controls the aggressiveness of inlining according to the value you specify for n where n is an integer in the range 1 - 1000000 that specifies the level of aggressiveness, as follows:

n= 100

Default compile time budget.

n> 100

Allows the inliner to consider more candidates and increase compile time.

n<100

Considers fewer candidates to reduce compile time for the inliner.

The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option has the same effect as specifying +Oinlinebudget=200. The +Osize option has the same effect as +Oinlinebudget=1.

NOTE: The +Oinlinebudget option takes precedence over both of these options. This means that you can override the effect of +Onolimit or +Osize option on inlining by specifying the +Oinlinebudget option on the same compile line.

Use this option at optimization level 2 or higher.

The default is +Oinlinebudget=100.

+Olit

+Olit=kind

The +Olit option places the data items that do not require load-time or run-time initialization in a read-only data section. +Olit=all is the default.

The defined values for kind are:

all

All string literals and all const-qualified variables that do not require load-time or run-time initialization will be placed in a read-only data section. +Olit=all replaces the deprecated +ESlit option.

const

All string literals appearing in a context where const char * is legal, and all const-qualified variables that do not require load-time or run-time initialization will be placed in a read-only data section. +Olit=const is mapped to +Olit=all with a warning except in C mode. +Olit=const replaces the deprecated +ESconstlit option in C.

none

No constants are placed in a read-only data section. +Olit=none replaces the deprecated +ESnolit option.

+Ointeger_overflow

+Ointeger_overflow=kind

To provide the best runtime performance, the compiler makes assumptions that runtime integer arithmetic expressions that arise in certain contexts do not overflow (produce values that are too high or too low to represent) both expressions that are present in user code and expressions that the compiler constructs itself. Note that if an integer arithmetic overflow assumption is violated, runtime behavior is undefined.

The defined values of kind are:

aggressive

Allows the compiler to make a broad set of assumptions that integer arithmetic expressions do not overflow.

conservative

Directs the compiler to make fewer assumptions that integer arithmetic expressions do not overflow.

moderate

This is the same as +Ointeger_overflow=aggressive, except that linear function test replacement (LFTR) optimization is not performed.

+Olevel

+Olevel=name1[,name2,...,nameN]

The +Olevel option lowers optimization to the specified level for one or more named functions.

level can be 0, 1, 2, 3, or 4.

The name parameters are names of functions in the module being compiled. Use this option when one or more functions do not optimize well or properly. This option must be used with a basic +Olevel or -O option. Note that currently only the C++ mangled name of the function is allowed for name.

This option works like the OPT_LEVEL pragma. The option overrides the pragma for the specified functions. As with the pragma, you can only lower the level of optimization; you cannot raise it above the level specified by a basic +Olevel or -O option. To avoid confusion, it is best to use either this option or the OPT_LEVEL pragma rather than both.

You can use this option at optimization levels 1, 2, 3, and 4. The default is to optimize all functions at the level specified by the basic +Olevel or -O option.

Examples:

  • The following command optimizes all functions at level 3, except for the functions myfunc1 and myfunc2, which it optimizes at level 1.

    aCC +O3 +O1=myfunc1,myfunc2 funcs.c main.c

  • The following command optimizes all functions at level 2, except for the functions myfunc1 and myfunc2, which it optimizes at level 0.

    aCC -O +O0=myfunc1,myfunc2 funcs.c main.c

+O[no]libcalls

+O[no]libcalls

The +O[no]libcalls option is deprecated and may not be supported in future releases. On Itanium®-based platforms, including a system header file will cause the functions declared therein to be eligible for libcalls transformations, regardless of the state of +O[no]libcalls.

The default is +Onolibcalls. Use +O[no]libcalls at any optimization level.

+O[no]loop_transform

+O[no]loop_transform

This option transforms [does not transform] eligible loops for improved cache and other performance. This option can be used at optimization levels 2, 3 and 4.

The default is +Oloop_transform.

+O[no]loop_unroll

+O[no]loop_unroll [=unroll_factor]

The +O[no]loop_unroll option enables [disables] loop unrolling. This optimization can occur at optimization levels 2, 3, and 4. The default is +Oloop_unroll. The default is 4, that is, four copies of the loop body. The unroll_factor controls code expansion.

+O[no]loop_unroll_jam

+O[no]loop_unroll_jam

The +O[no]loop_unroll_jam option enables [disables] loop unrolling and jamming. Loop unrolling and jamming increases register exploitation.

The default is +Onoloop_unroll_jam at optimization levels 3 and 4 only.

+O[no]moveflops

+O[no]moveflops

The +Onomoveflops option is approximately equivalent to +Ofltacc=strict +Ofenvaccess. The default is +Omoveflops.

This option is deprecated and may not be supported in future releases.

+O[no]openmp

+O[no]openmp

The +Oopenmp option causes the OpenMP directives to be honored. This option is effective at any optimization level. Non OpenMP parallelization directives are ignored with warnings. +Onoopenmp requests that OpenMP directives be silently ignored. If neither +Oopenmp nor +Onoopenmp is specified, OpenMP directives will be ignored with warnings.

The OpenMP specification is available at http://www.openmp.org/specs.

OpenMP programs require the libomp and libcps runtime support libraries to be present on both the compilation and runtime systems. The compiler driver automatically includes them when linking.

It is recommended that you use the -N option when linking OpenMP programs to avoid exhausting memory when running with large numbers of threads.

NOTE: HP aC++ version A.06.00 does not support C++ constructs in OpenMP. Use the +legacy_v5 option to use this option.

+opts

+opts filename

The file indicated by filename contains a list of options that are processed as if they had been specified on the command line at the point of the +opts option.

+O[no]parminit

+O[no]parminit

The +O[no]parminit option enables [disables] automatic initialization to non-NaT of unspecified function parameters at call sites. This is useful in preventing NaT values in parameter registers. The default is +Onoparminit.

+O[no]parmsoverlap

+O[no]parmsoverlap

The +Onoparmsoverlap option optimizes with the assumption that on entry to a function each of that function’s pointer-typed formals points to memory that is accessed only through that formal or through copies of that formal made within the function. For example, that memory must not be accessed through a different formal, and that formal must not point to a global that is accessed by name within the function or any of its calls.

Use +Onoparmsoverlap if C/C++ programs have been literally translated from FORTRAN programs.

The default is +Oparmsoverlap.

+O[no]procelim

+O[no]procelim

The +O[no]procelim option enables [disables] the elimination of dead procedure code and sometimes the unreferenced data.

Use this option when linking an executable file, to remove functions not referenced by the application. You can also use this option when building a shared library to remove functions not exported and not referenced from within the shared library. This may be especially useful when functions have been inlined.

NOTE: Any function having symbolic debug information associated with it is not removed.

The default is +Onoprocelim at optimization levels 0 and 1; at levels 2, 3 and 4, the default is +Oprocelim.

+O[no]promote_indirect_calls

+O[no]promote_indirect_calls

The +O[no]promote_indirect_calls option uses profile data from profile-based optimization and other information to determine the most likely target of indirect calls and promotes them to direct calls. Indirect calls occur with pointers to functions and virtual calls.

In all cases the optimized code tests to make sure the direct call is being taken and if not, executes the indirect call. If +Oinline is in effect, the optimizer may also inline the promoted calls.

+Opromote_indirect_calls is only effective with profile-based optimization.

NOTE: The optimizer tries to determine the most likely target of indirect calls. If the profile data is incomplete or ambiguous, the optimizer may not select the best target. If this happens, your code’s performance may decrease.

This option can be used at optimization levels 3 and 4. At +O3, it is only effective if indirect calls from functions within a file are mostly to target functions within the same file. This is because +O3 optimizes only within a file whereas, +O4 optimizes across files.

The default is +Opromote_indirect_calls at optimization level 3 and above.

+Onopromote_indirect_calls will be the default at optimization level 2 and below.

+Orarely_called

+Orarely_called=function1[,function2...]

The +Orarely_called option overrides any information in a profile database.

The named functions are assumed to be rarely called

+Orarely_called:filename

The file indicated by filename contains a list of functions, separated by spaces or newlines. These functions are assumed to be rarely called. This option overrides any information in a profile database.

+O[no]recovery

+O[no]recovery

The +O[no]recovery option generates [does not generate] recovery code for control speculation. The default is +Orecovery. For code that writes to uncacheable memory that may not be properly identified as volatile, the +Orecovery option reduces the risk of incorrect behavior.

NOTE: The program that uses signal handlers to catch signals raised by memory accesses may not behave correctly under +Onorecovery.

+O[no]signedpointers

+O[no]signedpointers

The +Osignedpointers option treats pointers in Boolean comparisons (for example, <, <=, >, >=) as signed quantities. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled.

The default is +Onosignedpointers.

NOTE: This option is supported in aC++ C-mode only. A warning is displayed in C++ when this option is used.

+Oshortdata

+Oshortdata[=size]

All objects of [size] bytes or smaller are placed in the short data area, and references to such data assume it resides in the short data area. Valid values of size are a decimal number between 8 and 4,194,304 (4MB).

If no size is specified, all data is placed in the short data area. The default is +Oshortdata=8.

NOTE: Using a value that is too big or without the optional size, possibly through +Ofast, may give various linker fix up errors, if there is more than 4Mb of short data.

+O[no]store_ordering

+O[no]store_ordering

The +O[no]store_ordering option preserves [does not preserve] the original program order for stores to memory that is visible to multiple threads. This does not imply strong ordering. The default is +Onostore_ordering.

+Otype_safety

+Otype_safety=kind

The +Otype_safety option controls type-based aliasing assumptions.

The defined values for kind are:

off

The default. Specifies that aliasing can occur freely across types.

limited

Code follows ANSI aliasing rules. Unnamed objects should be treated as if they had an unknown type.

ansi

Code follows ANSI aliasing rules. Unnamed objects should be treated the same as named objects.

strong

Code follows ANSI aliasing rules, except that accesses through lvalues of a character type are not permitted to touch objects of other types and field addresses are not to be taken.

The default is +Otype_safety=off.

+Ounroll_factor

+Ounroll_factor=n

The +Ounroll_factor option applies the unroll factor to all loops in the current translation unit. You can apply an unroll factor which you think is best for the given loop or apply no unrolling factor to the loop. If this option is not specified, the compiler uses its own heuristics to determine the best unroll factor for the inner loop.

A user specified unroll factor will override the default unroll factor applied by the compiler.

Specifying n=1 will prevent the compiler from unrolling the loop.

Specifying n=0 allows the compiler to use its own heuristics to apply the unroll factor.

NOTE: This option will be ignored if it is placed in a loop other than the innermost loop.

+O[no]volatile

+O[no]volatile

The +Ovolatile option implies that memory references to global variables are volatile and cannot be removed during optimization. The +Onovolatile option implies that all globals are not of volatile class. This means that references to global variables can be removed during optimization.

Use this option to control the volatile semantics for all global variables.

Use +Ovolatile at all optimization levels. The default is +Onovolatile.

NOTE: The +Ovolatile option is not recommended. Instead, use the C/C++ Standard volatile qualifiers.

+O[no]whole_program_mode

+O[no]whole_program_mode

The +O[no]whole_program option enables the assertion that only those files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals.

When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures. This option is in effect only at +O4 level of optimization.

All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4 but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program.

The default is +Onowhole_program_mode which disables the assertion.

Use this option to increase performance speed, but only when you are certain that only the files compiled with +Owhole_program_mode directly access any globals that are defined in these files.

Profile-Based Optimization Options

Profile-based optimization is a set of performance-improving code transformations based on the run-time characteristics of your application.

+Oprofile

+Oprofile=[use|collect]

The +Oprofile option instructs the compiler to instrument the object code for collecting run-time profile data. The profiling information can then be used by the linker to perform profile-based optimization.

+Oprofile=use[:filename] causes the compiler to look for a profile database file. This option overrides the FLOW_DATA environment variable.

After compiling and linking with +Oprofile=collect, run the resultant program using representative input data to collect execution profile data. Finally, recompile with the +Oprofile=use option to perform profile-based optimization.

Profile data is stored in flow.data by default.

Example:

aCC +Oprofile=collect -O -o prog.pbo prog.C

The above command compiles prog.C with optimization, prepares the object code for data collection, and creates the executable file prog.pbo. Running prog.pbo collects runtime information in the file flow.data in preparation for optimization with +Oprofile=use.

+Oprofile=collect [:<qualifiers>]

<qualifiers> are a comma-separated list of profile collection qualifiers.

Supported profile collection qualifiers:

arc

Enables collection of arc counts.

dcache

Enables collection of data cache misses.

stride

Enables collection of stride data.

all

Enables collection of all types of profile data. This is equivalent to +Oprofile=collect:arc,stride. This is the default.

This option merely enables the application for collection of the various forms of profiling data.

The environment variable PBO_DATA_TYPE controls the type of data collected at runtime. It may be set to one of the following values, which must be consistent with the +Oprofile=collect qualifiers used to create the application:

arc-stride

Collects stride and/or arc counts. This is the default if PBO_DATA_TYPE is not set.

dcache

Collects data cache miss metrics.

NOTE: Data cache miss metrics cannot be collected during the same run of an application as stride and/or arc data.

Displaying Optimization Information

The +O[no]info option displays informational messages about the optimization process.

+O[no]info

+O[no]info

The +O[no]info option displays messages about the optimization process. This option may be helpful in understanding what optimizations are occurring. You can use the option at levels 0-4.

The default is +Onoinfo at levels 0-4.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.