NOTE: See the Compiling & Running HP C Programs section of the HP C Online Help for a quick reference of all HP C general use compiler options and pragmas.
Summary of Major Optimization Levels
Supporting Optimization Options
Enabling Basic Optimization
Enabling Different Levels of Optimization
Changing the Aggressiveness of Optimizations
Enabling Only Conservative Optimizations
Enabling Aggressive Optimizations
Removing Compilation Time Limits When Optimizing
Limiting the Size of Optimized Code
Specifying Maximum Optimization
Combining Optimization Parameters
Summary of Optimization Parameters
Profile-Based Optimization
Controlling Specific Optimizer Features
Using Advanced Optimization Options
Level 1 Optimizations
Level 2 Optimizations
Level 3 Optimizations
Level 4 Optimizations
Guidelines for Using the Optimizer
Optimizer Assumptions
Optimizer Pragmas
Aliasing Options
Improving Shared Library Performance
Improving Compile and Link Times
The HP C optimizer transforms programs so machine resources are used more efficiently. The optimizer can dramatically improve application run-time speed. HP C performs only minimal optimizations unless you specify otherwise. You activate additional optimizations using HP C command-line options and pragmas.
There are four major levels of optimization: levels 1, 2, 3, and 4. Level 4 optimization can produce the fastest executable code. Level 4 is a superset of the other levels.
Optimization levels can be expressed as either:
+Oorlevel
-Olevel
Additional parameters enable you to control the size of the executable program, compile time, and aggressiveness of the optimizations performed.
Compile time memory and CPU usage increase with each higher level of optimization due to the increasingly complex analysis that must be performed. You can control the trade-offs between compile-time penalties and code performance by choosing the level of optimization you desire.
Generally, the optimizer is not used during code development. It is
used when compiling production-level code for benchmarking and general
use.
Summary of Major Optimization Levels
Table 7: HP C Major Optimization Options summarizes
the major optimization options of HP C:
cc -O sourcefile.cBasic optimizations do not change the behavior of ANSI C standard-conforming code. They improve run-time execution time but only increase compile time and link time by a moderate amount.
cc +O1 sourcefile.cLevel 1 optimization compiles quickly, but still provides some run-time speedup.
cc +O2 sourcefile.cLevel 2 (equivalent to -O) takes more time to compile, but produces greatly improved run-time speed.
cc +O3 sourcefile.cLevel 3 does full optimization of all subprograms within a single file.
cc +O4 sourcefile.cLevel 4 can potentially produce the greatest improvements in speed by performing optimizations across multiple object files. Level 4 does optimizations at link time, so compiles will be faster, but links will be longer.
Depending on the size and number of the modules, compiling at level 4 can consume a large amount of virtual memory. Level 4 may consume roughly 1.25 megabytes per 1000 lines of non-commented source. When you use level 4 on a large application, it is a good idea to increase the system swap space. For information on increasing system swap space, see the book Managing Systems and Workgroups.
Object files generated by the compiler at optimization level 4, called
intermediate object files (IELF), are intended to be temporary files. These
object files contain an intermediate representation of the user code
in a format that is designed for advanced optimizations. Hewlett-Packard
reserves the right to change the format of these files
without prior notice. There is no guarantee that intermediate object
files will be compatible from one revision of the compiler to the
next. HP C will issue an error message and terminate when an
incompatible intermediate file is generated. For this reason, should always
recompile if you want to optimize at +O4,
to ensure compatibility and integrity of your optimized applications.
Use the +Olit=none +Ofltacc=strict options at optimization level 2, 3, or 4 if you are not sure if your code conforms to standards. This option provides more safety.
Use the +Ofast option at optimization level 2, 3, or
4 for best performance when you are willing to risk changes to the behavior
of your programs.
cc +O2 +Olit=none +Ofltacc=strict sourcefile.cor:
cc +O3 +Olit=none +Ofltacc=strict sourcefile.cor:
cc +O4 +Olit=none +Ofltacc=strict sourcefile.cConservative optimizations are optimizations that do not change the behavior of code, in most cases, even if the code does not conform to standards.
Use the conservative optimizations provided with level 2, 3, and 4 when
your code is non-ANSI.
cc +O2 +Ofast sourcefile.cor:
cc +O3 +Ofast sourcefile.cor:
cc +O4 +Ofast sourcefile.cAggressive optimizations are new optimizations or are optimizations that can change the behavior of programs. These optimizations may do any of the following:
cc +O2 +Onolimit sourcefile.cor:
cc +O3 +Onolimit sourcefile.cor:
cc +O4 +Onolimit sourcefile.cBy default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times and greater virtual memory use are acceptable because you want additional optimizations to be performed.
cc +O2 +Osize sourcefile.cor:
cc +O3 +Osize sourcefile.cor:
cc +O4 +Osize sourcefile.cMost optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations.
Use this option if you have limited main memory, swap space, or disk
space.
cc +OallThe +Oall option performs the maximum optimization.
Use +Oall with stable, well-structured, ANSI-conforming code. These types of optimizations give you the fastest code, but are riskier than the default optimizations.
You can use +Oall at optimization levels 2, 3, and 4. The default is +Onoall.
The +Oall option by itself (specified without the +02,
+03, or +04 options) implements +O4 +Ofast.
This performs aggressive optimizations with unrestricted
compile time at the highest level of optimization.
For example, to specify conservative optimizations at level 2 and disable code-expanding optimizations, use:
cc +O2 +Olit=none +Ofltacc=strict +Osize sourcefile.c+Olimit and +Osize can be used with either +Ofast or +O lit=none +Ofltacc=strict.
You cannot use +Ofast +Ofltacc=relaxed with
+Olit=none +Ofltacc=strict.
| Option | What It Does | Level of Opt |
|---|---|---|
| +O[no]aggressive | NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use the +Ofast option. | 2, 3, 4 |
| +O[no]all | NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use +Ofast +O4 option in place of +O[no]all. | 4 |
| +O[no]conservative | NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use the +Olit=none +Ofltacc=strict +Oconservative options in place of +O[no]conservative. | 2, 3, 4 |
| +O[no]info | +Oinfo displays informational messages about the optimization process. This option supports the core optimization levels, and therefore, can be used at levels 0-4. The default is +Onoinfo. | 0, 1, 2, 3, 4 |
| +O[no]limit | The +Olimit option suppresses optimizations that significantly increase compile-time or that can consume a lot of memory. The +Onolimit option allows optimizations to be performed regardless of their effect on compile-time or memory usage. The default is +Olimit. | 2, 3, 4 |
| +O[no]size | The +O[no]size option enables [disables]
optimizations that greatly
expand code size at +O2 and above.
Most optimizations improve code speed
and simultaneously reduce code size. However, some
optimizations may greatly increase code size.
Loop unrolling is one such optimization, and
is disabled when using +Osize.
This option also disables inlining, and may help reduce instruction cache misses. |
2, 3, 4 |
There are three steps involved in performing this optimization:
Invoke profile-based optimization through HP C by using any level of optimization and the +I and +P options on the cc command line.
When you use PBO, compile times are faster and link times are slower
because code generation happens at link time.
cc -Aa +I -O -c sample.c Compile for instrumentation.
cc -o sample.exe +I -O sample.o Link to make instrumented executable.The first command line uses the -O option to perform level 2 optimization and instruments the code. The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The.o file can be used later in the optimization phase, avoiding a second compile.
The second command line uses the -o option to link sample.o
into sample.exe. The +I option instruments sample.exe
with data collection code. Note that instrumented programs run slower than
non-instrumented programs. Only use instrumented code to collect statistics
for profile-based optimization.
sample.exe < input.file1 Collect execution profile data.
sample.exe < input.file2This step creates and logs the profile statistics to a file, by default called flow.data. You can use this data collection file to store the statistics from multiple test runs of different programs that you may have instrumented.
cc -o sample.exe +P -O sample.oAn alternative to this procedure is to recompile the source file in the optimization step:
cc -o sample.exe +I -0 sample.c instrumentation
sample.exe < input.file1 data collection
cc -o sample.exe +P -O sample.c optimization
You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.
You can use the FLOW_DATA environment variable to specify the name of the profile data file with either the +I or +P options. You can use the +df command-line option to specify the name of the profile data file with the +P option.
The +df option takes precedence over the FLOW_DATA environment variable.
In the following example, the FLOW_DATA environment variable is set to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.
% setenv FLOW_DATA /users/profiles/prog.data % cc -Aa -c +I +O3 sample.c % cc -o sample.exe +I +03 sample.o % sample.exe < input.file1 % cc -o sample.exe +P +03 sample.oIn the next example, the +df option uses /users/profiles/prog.data to override the flow.data file name.
% cc -Aa -c +I +O3 sample.c % cc -o sample.exe +I +03 sample.o % sample.exe < input.file1 % mv flow.data /users/profile/prog.data % cc -o sample.exe +df /users/profiles/prog.data +P +03 sample.o
Care must be taken when maintaining different versions of the executable file because the instrumented program file name is used as the key identifier when storing execution profile data in the data file.
The optimizer must know what this key identifier name is in order to find the execution profile data. By default, the key identifier name used to retrieve the profile data is the instrumented program file name used to run the program for data collection.
At each level, you can turn on and off specific optimizations using the +O[no]optimization option. The optimization parameter is the name of a specific optimization technique. The optional prefix [no] disables the specified optimization.
Below is a list of advanced optimizer options, followed by detailed information on each option:
Default: All functions are optimized at the level specified by the ordinary +Olevel option.
This option lowers optimization to the specified levelfor one or more named functions. level can be 0, 1, 2, 3, or 4. The name parameters are names of functions in the module being compiled. Use this option when one or more functions do not optimize well or properly. It must be used with an ordinary +Olevel option.
This option works the same as the OPT_LEVEL pragma described under Optimizer
Control Pragmas . This option overrides the OPT_LEVEL pragma for the
specified functions. As with the pragma, you can only lower the level of
optimization; you cannot raise it above the level specified in the ordinary
+Olevel option. To avoid confusion, it is best to use either this
option or the OPT_LEVEL pragma rather than both.
$ cc +O3 +O1=myfunc1,myfunc2 funcs.c main.c
The following command optimizes all functions at level 2, except for the functions myfunc1 and myfunc2, which it optimizes at level 0.
$ cc -O +O0=myfunc1,myfunc2 funcs.c main.c
+O[no]cxlimitedrange
Optimization level:
Default: +Onocxlimitedrange
+O[no]cxlimitedrange enables [disables] the use
of floating point math in the compilation unit.
This is equivalent to the CX_LIMITED_RANGE pragma
except that it applies to a compilation unit as
opposed to a declaration or statement.
+O[no]cross_region_addressing
Optimization level:
Default: +Onocross_region_addressing
+O[no]cross_region_addressing enables [disables]
the use of cross-region addressing.
Cross-region addressing is required if a pointer (such
as an array base) points to a different region than the
data being addressed. This is usually due to an offset
which results in a cross-over into another region.
Standard-conforming applications do not require using
cross-region addressing.
+Odataprefetch=[none|direct|indirect]
Default: +Odataprefetch=indirect
When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions will be inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression).
+Odataprefetch=direct inserts prefetches for loads and stores that have inductive addresses. The prefetches are inserted such that the latency to main memory is covered (i.e., assuming the data is not in any cache), and are given the appropriate cache hint for the data type being accessed. Prefetches are inserted for integer and floating-point loads, and floating-point stores, which are on heavily-executed paths through the loop.
The compiler attempts to minimize the overhead of prefetching using a number of techniques, which may involve unrolling the loop.
+Odata_prefetch and +Odata_prefetch=indirect
directs the compiler to insert prefetches for indirectly-accessed
data within loops.
+O[no]dataprefetch
Default: +Odataprefetch
When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions will be inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression).
The default, +Odataprefetch is the same as +Odataprefetch=indirect.
It directs the compiler to insert prefetches for indirectly-accessed
data within loops.
Default: +O2 +Onolimit +Ofltac=relaxed +FPD +DSnative +_Oshortdata alias
+Ofast initiates a combination of compilation options for optimum execution speed at build times. The compiler options expanded when using +Ofast are: +O2 +Olibcalls +Onolimit +Ofltacc=relaxed +FPD +DSnative +Oshortdata. It is equivalent to the -fast option.
+Ofast is safe for most applications, but it's use may result in higher compile time or incorrect output for code that requires strict floating point accuracy.
In addition to the optimizations performed at +O2, +Ofast:
Default: +O2 +Onolimit +Ofltac=relaxed +FPD +DSnative +_Oshortdata alias
+Ofaster selects the +Ofast option at optimization level +02.
Must be used with +P or else the optimization level
will drop to +03.
+O[no]fenv_access
Optimization level:
Default: +Onofenv_access
+O[no]fenv_access informs the compiler that a program accesses [does not access] the floating point environment to test flags or run under non-default modes. If it knows that a program does not access the floating point environment, the compiler is allowed to perform certain optimizations that it otherwise may not perform. These include global common subexpression elimination, code motion, or constant folding.
Using +Ofenvaccess is
equivalent to adding STDC FENV_ACCESS ON at the beginning of each
source file to be compiled.
The +Onofltacc option allows the compiler to perform floating-point optimizations that are algebraically correct but that may result in numerical differences. For example, this option may change the order of expression evaluation as such: If a, b, and c are floating-point variables, the expressions (a + b) + c and a + (b + c) may give slightly different results due to rounding. In general, these differences will be insignificant.
The +Onofltacc option also enables the optimizer to generate fused multiply-add (FMA) instructions, the FMPYFADD and FMPYNFADD. These instructions improve performance but occasionally produce results that may differ from results produced by code without FMA instructions. In general, the differences are slight.
Specifying +Ofltacc disables the generation of FMA instructions as well as some other floating-point optimizations. Use +Ofltacc if it is important that the compiler evaluate floating-point expressions as it does in unoptimized code. The +Ofltacc option does not allow any optimizations that change the order of expression evaluation and therefore may affect the result.
If you are optimizing code at level 2 or higher and do not specify +Onofltacc or +Ofltacc, the optimizer will use FMA instructions, but will not perform floating-point optimizations that involve expression reordering or other optimizations that potentially impact numerical stability.
The list below identifies the different actions taken by the optimizer according to whether you specify +Ofltacc, +Onofltacc, or neither option.
Optimization Expression FMA? Options Reordering? +02 No Yes +02 +Ofltacc No No +02 +Onofltacc Yes Yes
Default: +Ofltacc=default
+Ofltacc controls the level of floating point optimizations that the compiler may perform so that the expected accuracy of floating-point computation is not violated. The following are defined values for +Ofltacc:
Default: In the absence of dynamically obtained profile information, the frequency with which a function is called is unknown. +Ofrequently_called specifies that functions with the given filenames are to be assumed to be frequently called within the application. This is independent of +P: +Ofrequently_called overrides any dynamically obtained profile information.
The file indicated by filename contains a list of
function names, separated by spaces or newlines.
+Ofrequently_called=function1[,function2]*
Optimization levels: 2, 3, 4
Default: In the absence of dynamically obtained profile information,
the frequency with which a function is called is unknown.
+Ofrequently_called specifies that a list of functions
are to be assumed to be frequently called within the
application. This is independent of +P: +Ofrequently_called
overrides any dynamically obtained profile information.
br>
+O[no]info
Optimization levels: 0, 1, 2, 3, 4
Default: +Onoinfo
Provide [do not provide] feedback information about the
optimization process. This option is most useful at
optimization levels 3 and 4.
+O[no]initcheck
Optimization levels: 2, 3, 4
Default: unspecified
The initialization checking feature of the optimizer has three possible states: on, off, or unspecified. When on (+Oinitcheck), the optimizer initializes to zero any local, scalar, non-static variables that are uninitialized with respect to at least one path leading to a use of the variable.
When off (+Onoinitcheck), the optimizer issues warning messages when it discovers definitely uninitialized variables, but does not initialize them.
When unspecified, the optimizer initializes to zero any local, scalar, non-static variables that are definitely uninitialized with respect to all paths leading to a use of the variable.
Use +Oinitcheck to look for variables in a program that may
not be initialized.
+O[no]inline: filename
Optimization levels: 3, 4
Default: +Oinline
When +Oinline is specified without a filename, any function can be inlined. For inlining to be successful, follow prototype definitions for function calls in the appropriate header file.
When specified with a filename, the named functions are important candidates for inlining. For example, saying
+Oinline=foo,bar +Onoinlineindicates that inlining be strongly considered for foo and bar; all other routines will not be considered for inlining, since +Onoinline is given.
When this option is disabled with a filename, the compiler will not consider the specified routines as candidates for inlining. For example, saying
+Onoinline=baz,xindicates that inlining should not be considered for baz and x; all other routines will be considered for inlining, since +Oinline is the default.
The +Onoinline disables inlining for all functions or a specific list of functions.
Use this option when you need to precisely control which subprograms
are inlined.
+O[no]inline=symbol[,symbol]*
Optimization levels: 3, 4
Default: +Oinline
When +Oinline is specified without a symbol, list any function can be inlined. For inlining to be successful, follow prototype definitions for function calls in the appropriate header file.
When specified with a symbol list, the named functions are important candidates for inlining.
When this option is disabled with a symbol list, the compiler will not consider the specified routines as candidates for inlining.
The +Onoinline disables inlining for all functions or a specific list of functions.
Use this option when you need to precisely control which subprograms
are inlined.
+Oinline_budget=n
Optimization levels: 3, 4
Default: +Oinline_budget=100
where n is an integer in the range 1 - 1000000 that specifies the level of aggressiveness, as follows:
Note, however, that the +Oinline_budget=n option takes
precedence over both of these options. This means that you can override
the effect of +Onolimit or +Osize option on inlining
by specifying the +Oinline_budget=n option on the same
compile line.
+O[no]libcalls
Optimization levels: 0, 1, 2, 3, 4
Default: +O[no]libcalls
Use the +Olibcalls option to increase the runtime performance of code which calls standard library routines in simple contexts.
The +Olibcalls option expands the following library calls inline:
A single call to printf() may be replaced by a series of calls to putchar(). Calls to sprintf() and strlen() may be optimized more effectively, including elimination of some calls producing unused results. Calls to setjmp() and longjmp() may be replaced by their equivalents _setjmp() and _longjmp(), which do not manipulate the process"s signal mask.
Use +Olibcalls to improve the performance of selected library routines only when you are not performing error checking for these routines.
Using +Olibcalls with +Ofltacc will give different floating point calculation results than those given using +Ofltacc without +Olibcalls.
The +Olibcalls option replaces the obsolete -J option.
+Olibcalls=[all|default|none]
Optimization levels: 0, 1, 2, 3, 4
Default: +Olibcalls=default
Control the use of low-call-overhead versions of select library routines. No error checking is done; that is, errno(2) is not set. This optimization can occur at optimization levels 0, 1, 2, 3, and 4. The defined values for level are:
Default: +Onolibmerrno
Enable [disable] support for errno in libm functions.
+O[no]limit
Optimization levels: 2, 3, 4
Default: +Olimit
The +O[no]limit option controls the amount of compile-time spent performing optimization. By default, the compiler focuses on optimizing large programs at +O2 and above; this is done to avoid non-linear compile times.
You can remove optimization time restrictions at +O2 and above by using the +Onolimit option. This lets you perform full optimization of large procedures, but may incur significant compile time increases for very large procedures.
If longer compile times are acceptable, +Onolimit can result in significant performance improvements.
To completely avoid non-linear compile times,
you can limit the amount of time spent optimizing code
by using +Olimit.
+Olimit=[default|min|max]
Optimization levels: 2, 3, 4
Default: +Olimit=default
The +Olimit=[default|min|max] option controls the amount of compile-time spent performing optimization. The defined values are:
The +O[no]limit option controls the amount of compile-time spent performing optimization. By default, the compiler focuses on optimizing large programs at +O2 and above; this is done to avoid non-linear compile times.
You can remove optimization time restrictions at +O2 and above by using the +Olimit=none option. This lets you perform full optimization of large procedures, but may incur significant compile time increases for very large procedures.
To completely avoid non-linear compile times,
you can limit the amount of time spent optimizing code
by using +Olimit=min.
+Olit=[all|const|none]
Optimization levels:
Default: +Olit=all
Controls which data items are placed in the read-only data section. The defined values for level are:
Default: +Oparminit
Enable [disable] automatic initialization of
unspecified function parameters at call sites to zero.
This is useful for preventing NaT values in parameter
registers.
+O[no]parmsoverlap
Optimization levels: 1, 2, 3, 4
Default: +Oparmsoverlap
Optimize with the assumption that
the actual arguments of function calls overlap in memory.
Use +Onoparmsoverlap if C programs have been literally translated
from HP Fortran programs.
+O[no]procelim
Optimization levels: 0, 1, 2, 3, 4
Default: +Onoprocelim at levels 0-3, +Oprocelim at level 4
When +Oprocelim is specified, procedures that are not referenced by the application are eliminated from the output executable file. The +Oprocelim option reduces the size of the executable file, especially when optimizing at levels 3 and 4, at which inlining may have removed all of the calls to some routines.
When you specify +Onoprocelim, procedures that are not referenced by the application are not eliminated from the output executable file.
The default is +Onoprocelim at levels 0-3, and +Oprocelim at level 4.
If the +Oall option is enabled, the +Oprocelim option
is enabled.
+Oprofile=use:filename
Optimization levels: 0, 1, 2, 3, 4
Default: Name of PBO file
Specify the filename as the name of the
previously collected profile database (PBO) file.
+Oprofile=collect
Optimization levels: 0, 1, 2, 3, 4
Default:
Instrument the application for profile-based
optimization. This is the same as the +I
option.
+O[no]promote_indirect_calls
Optimization levels: 3, 4 and profile-based optimization
Default: +Onopromote_indirect_calls
This option uses profile data from profile-based optimization and other information to determine the most likely target of indirect calls and promotes them to direct calls. In all cases the optimized code tests to make sure the direct call is being taken & if not, executes the indirect call. If +Oinline is in effect, the optimizer may also inline the promoted calls. This option can only be used with profile-based optimization, described in Profile-Based Optimization .
The optimizer tries to determine the most likely target of indirect calls. If the profile data is incomplete or ambiguous, the optimizer may not select the best target. If this happens, your code's performance may decrease.
At +O3, this option is only effective if indirect calls from functions
within a file are mostly to target functions within the same file. This
is because +O3 optimizes only within a file whereas +O4 optimizes across
files.
+O[no]ptrs_ansi
Optimization levels: 2, 3, 4
Default: +Onoptrs_ansi
Use +Optrs_ansi to make the following two assumptions, which the more aggressive +Optrs_strongly_typed does not make:
Default: +Onoptrs_ansi
Tell the optimizer whether global variables are
modified [are not modified] through pointers.
+O[no]ptrs_strongly_typed
Optimization levels: 2, 3, 4
Default: +Onoptrs_strongly_typed
Use +Optrs_strongly_typed when pointers are type-safe. The optimizer can use this information to generate more efficient code.
Type-safe (that is, strongly-typed) pointers are pointers to a specific type that only point to objects of that type, and not to objects of any other type. For example, a pointer declared as a pointer to an int is considered type-safe if that pointer points to an object only of type int, but not to objects of any other type.
Based on the type-safe concept, a set of groups are built based on object types. A given group includes all the objects of the same type.
The term type-inferred aliasing is a concept which means any pointer of a type in a given group (of objects of the same type) can only point to any object from the same group; it can not point to a typed object from any other group.
For more information about type aliasing see Aliasing Options .
Type casting to a different type violates type-inferring aliasing rules. See Example 2 below.
Dynamic casting is allowed. See Example 3 below.
For more details, see Aliasing Options .
Example 1: How Data Types Interact
The optimizer generally spills all global data from registers to memory before any modification to global variables or any loads through pointers. However, you can instruct the optimizer on how data types interact so it can generate more efficient code.
If you have the following:
1 int *p;
2 float *q;
3 int a,b,c;
4 float d,e,f;
5 foo()
6 {
7 for (i=1;i<10;i++) {
8 d=e
9 *p=b;
10 e=d+f;
11 f=*q;
12 }
13 }
With +Onoptrs_strongly_typed turned on, the pointers p
and q will be assumed to be disjoint because the types they point
to are different types. Without type-inferred aliasing, *p is
assumed to invalidate all the definitions. So, the use of d and
f
on line 10 have to be loaded from memory. With type-inferred aliasing,
the optimizer can propagate the copy of d and f and thus
avoid two loads and two stores.
This option can be used for any application involving the use of pointers, where those pointers are type safe. To specify when a subset of types are type-safe, use the [NO]PTRS_STRONGLY_TYPED pragma. The compiler issues warnings for any incompatible pointer assignments that may violate the type-inferred aliasing rules discussed in Aliasing Options .
Example 2: Unsafe Type Cast
Any type cast to a different type violates type-inferred aliasing rules. Do not use +Optrs_strongly_typed with code that has these unsafe type casts. Use the [NO]PTRS_STRONGLY_TYPED pragma to prevent the application of type-inferred aliasing to the unsafe type casts.
struct foo{
int a;
int b;
} *P;
struct bar {
float a;
int b;
float c;
} *q;
P = (struct foo *) q;
/* Incompatible pointer assignment
through type cast */
Example 3: Generally Applying Type Aliasing
Dynamic cast is allowed with +Optrs_strongly_typed or +Optrs_ansi. A pointer dereference is called dynamic cast if a cast is applied on the pointer to a different type.
In the example below, type-inferred aliasing is applied on P generally, not just to the particular dereference. Type-aliasing will be applied to any other dereferences of P.
struct s {
short int a;
short int b;
int c;
} *P;
* (int *)P = 0;
For more information about type aliasing, see Aliasing
Options .
Default: In the absence of dynamically obtained profile information, the frequency with which that function is called is unknown. +Orarely_called specifies that functions with the given filenames are to be assumed to be rarely called within the application. This is independent of +P: +Orarely_called overrides any dynamically obtained profile information.
The file indicated by filename contains a list of
function names, separated by spaces or newlines.
+O[no]rarely_called=function1[,function2]*
Optimization levels: 2, 3, 4
Default: In the absence of dynamically obtained
profile information, the frequency with which that function
is called is unknown.
+Orarely_called specifies that a function list is
assumed to be rarely called within the application. This is
independent of +P: +Orarely_called overrides
any dynamically obtained profile information.
+O[no]recovery
Optimization levels: 2, 3, 4
Default: +Onorecovery
+O[no]recovery generates [does not generate] recovery code for control speculation. This option specifies whether recovery code will be generated for control speculation. When this option is enabled, each control speculative load will have a matching control speculative check instruction, inserted by the compiler at the original position of the load. A block of recovery code is then inserted at the label specified by the check instruction.
If +Onorecovery is specified, no
control speculative checks or recovery blocks are inserted.
Instead, the operating system handles the recovery. The advantage of using
+Onorecovery is that code size is smaller,
both along the critical path, due to the lack of check instructions,
and overall, due to the lack of recovery blocks.
+Oreusedir=directory
Optimization levels: 4 or with profile-based optimization
Default: no reuse of object files
This option specifies a directory where the linker can save object files created from intermediate object files when using +O4 or profile-based optimization. It reduces link time by not recompiling intermediate object files when they don't need to be.
When you compile with +I, +P, or +O4, the compiler generates intermediate code in the object file. Otherwise, the compiler generates regular object code in the object file. When you link, the linker first compiles the intermediate object code to regular object code, then links the object code. With this option you can reduce link time on subsequent links by avoiding recompiling intermediate object files that have already been compiled to regular object code and have not changed.
Note that when you do change a source file or command line options and
recompile, a new intermediate object file will be created and compiled
to regular object code in the specified directory. The previous object
file in the directory will not be removed. You should periodically remove
this directory since old object files cannot be reused and will not be
automatically removed.
+Oshortdata
Optimization levels: 0, 1, 2, 3, 4
Default: +Oshortdata=8 (specifies a size of eight bytes in the short data area)
Controls the size of objects placed in the short data area. All objects of size n=bytes will be placed in the short data area. Valid values of n are 0, or a decimal number between 8 and 4,191,304 bytes (4MB).
Default: +Onosignedpointers
Perform [or do not perform] optimizations related to treating pointers as signed quantities. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled.
Use +Osignedpointers to improve application run-time speed.
+O[no]store_ordering
Optimization levels: 1, 2, 3, 4
Default: +Onostore_ordering
+O[no]store_ordering preserves (does not preserve)
the original program order for stores to memory that may not be
visable to multiple threads. This does not, however,
imply strong ordering.
+O[no]type_safety=[off|limited|ansi|strong]
Optimization levels: 1, 2, 3, 4
Default: +Otype_safety=off, +Onotype_safetyM
Enable [disable] aliasing across types. The following are +O[no]type_safety values:
Default: +Onovolatile
The +Ovolatile=qualifier1[,qualifier2...] has the effect of applying the specified qualifiers to all uses of volatile in the source. If used in conjunction with +Ovolatile, the qualifiers also apply to the implicit volatile declarations of global variables. The defined values for qualifer include the following:
Default: +Onowhole_program_mode
The +Owhole_program_mode option enables the assertion that only the files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals.
When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures.
All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4 but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program.
The default, +Onowhole_program_mode, disables the assertion.
Use this option to increase performance speed, but only when you are
certain that only the files compiled with +Owhole_program_mode
directly access any globals that are defined in these files.
if(a) {
.
.
.
statement 1
} else {
goto L1;
}
statement 2
L1:
becomes:
if(!a) {
goto L1;
}
statement 1
statement 2
L1:
For example, the code:
if(0) {
a = 1;
} else {
a = 2;
becomes:
a = 2;
This module performs the following:
LDW -52(0,30),r1 ADDI 3,r1,r31 ;interlock with load of r1 LDI 10,r19becomes:
LDW -52(0,sp),r1 LDI 10,r19 ADDI 3,r1,r31 ;use of r1 is now separated from load
For example, the code:
LDI 32,r3 AND r1,r3,r2 COMIB,= 0,r2,L1becomes:
BB,>= r1, 26, L1
You can help the optimizer understand when certain variables are heavily used within a function by declaring these variables with the register qualifier. The first 10 register qualified variables encountered in the source are honored. You should pick the ten most important variables to be most effective.
The coloring register allocator may override your choices and promote to a register a variable not declared register over one that is, based on estimated speed improvements.
The following code shows the type of optimization the coloring register allocation module performs. The code:
LDI 2,r104 COPY r104,r103 LDO 5(r103),r106 COPY r106,r105 LDO 10(r105),r107becomes:
LDI 2,r25 LDO 5(r25),r26 LDO 10(r26),r31
For example, the code:
for (i=0; i<25; i++) {
r[i] = i * k;
}
becomes:
t1 = 0;
for (i=0; i<25; i++) {
r[i] = t1;
t1 += k;
}
For example, the code:
a = x + y + z; b = x + y + w;becomes:
t1 = x + y; a = t1 + z; b = t1 + w;
A = 10; B = A + 5; C = 4 * B;can be replaced by:
A = 10; B = 15; C = 60;
For example, the code:
x = z;
for(i=0; i<10; i++)
{
a[i] = 4 * x + i;
}
becomes:
x = z;
t1 = 4 * x;
for(i=0; i<10; i++)
{
a[i] = t1 + i;
}
For example, the following HP C code:
a = x + 23;where a is a local variable.
return a;produces the following code for the unoptimized case:
LDO 23(r26),r1 STW r1,-52(0,sp) LDW -52(0,sp),ret0and this code for the optimized case:
LDO 23(r26),ret0
For example, the function:
f(int x)
{
int a,b,c:
a = 1;
b = 2;
c = x * b;
return c;
}
becomes:
f(int x)
{
int a,b,c;
b = 2;
c = x * b;
return c;
}
Within loops, the virtual memory address expression can be rearranged and separated into a loop varying term and a loop invariant term. Loop varying terms are those items whose values may change from one iteration of the loop to another. Loop invariant terms are those items whose values are constant throughout all iterations of the loop. The loop varying term corresponds to the difference in the virtual memory address associated with a particular array reference from one iteration of the loop to the next.
The register reassociation optimization dedicates a register to track the value of the virtual memory address expression for one or more array references in a loop and updates the register appropriately in each iteration of a loop.
The register is initialized outside the loop to the loop invariant portion of the virtual memory address expression and the register is incremented or decremented within the loop by the loop variant portion of the virtual memory address expression. On PA-RISC, the update of such a dedicated register can often be performed for free using the base-register modification capability of load and store instructions.
The net result is that array references in loops are converted into equivalent but more efficient pointer dereferences.
For example:
int a[10][20][30];
void example (void)
{
int i, j, k;
for (k = 0; k < 10; k++)
for (j = 0; j < 10; j++)
for (i = 0; i < 10; i++)
{
a[i][j][k] = 1;
}
}
after register reassociation is applied to the innermost loop becomes:
int a[10][20][30];
void example (void)
{
int i, j, k;
register int (*p)[20][30];
for (k = 0; k < 10; k++)
for (j = 0; j < 10; j++)
for (p = (int (*)[20][30]) a[0][j][k], i = 0; i < 10; i++)
{
*(p++[0][0]) = 1;
}
}
In the above example, the compiler-generated temporary register variable,
p,
strides through the array a in the innermost loop. This register
pointer variable is initialized outside the innermost loop and auto-incremented
within the innermost loop as a side-effect of the pointer dereference.
Register reassociation can often enable another loop optimization. After
performing the register reassociation optimization, the loop variable may
be needed only to control the iteration count of the loop. If this is case,
the original loop variable can be eliminated altogether by using the PA-RISC
ADDIB
and ADDB machine instructions to control the loop iteration count.
Level 3 optimization produces faster run-time code than level 2 on code
that frequently calls small functions within a file. Level 3 links faster
than level 4.
/* Return the greatest common divisor of two positive integers, */
/* int1 and int2, computed using Euclid"s algorithm. (Return 0 */
/* if either is not positive.) */
int gcd(int1,int2)
int int1;
int int2;
{
int inttemp;
if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {
return(0);
}
do {
if ( int1 < int2 ) {
inttemp = int1;
int1 = int2;
int2 = inttemp;
}
int1 = int1 - int2;
} while (int1 > 0);
return(int2);
}
main()
{
int xval,yval,gcdxy;
/* statements before call to gcd */
gcdxy = gcd(xval,yval);
/* statements after call to gcd */
}
After inlining, the source file looks like this:
main()
{
int xval,yval,gcdxy;
/* statements before inlined version of gcd */
{
int int1;
int int2;
int1 = xval;
int2 = yval;
{
int inttemp;
if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {
gcdxy = ( 0 );
goto AA003;
}
do {
if ( int1 < int2 ) {
inttemp = int1;
int1 = int2;
int2 = inttemp;
}
int1 = int1 - int2;
} while ( int1 > 0 );
gcdxy = ( int2 );
}
}
AA003 : ;
/* statements after inlined version of gcd */
}
Inlining substitutes function calls with copies of the function"s object
code. Only functions that meet the optimizer"s criteria are inlined. This
may result in slightly larger executable files. However, this increase
in size is offset by the elimination of time-consuming procedure calls
and procedure returns.
When gathering this information, the HP C compiler makes the following assumption: while inside a function, the only variables that can be accessed indirectly through a pointer or by another function call are:
OPT_LEVEL 0, 1, and 2 provide more control over optimization than the +O1 and +O2 compiler options. You use these pragmas to raise or lower optimization at a function level inside the source file. Whereas, the compiler options can only be used for an entire source file. (OPT_LEVEL 3 and 4 can only be used at the beginning of the source file.)
Table 11: Optimization Level Precedence shows
the possible combinations of options and pragmas and the resulting optimization
levels. The level at which a function will be optimized is the lower of
the two values specified by the command line optimization level and the
optimization pragma in force.
The values of OPTIMIZE and OPT_LEVEL are summarized
in Table 12: Optimizer Control Pragmas
The NOINLINE pragma disables inlining for all functions or specified functionname(s).
The syntax for performing inlining is:
#pragma INLINE [functionname(1), ..., functionname(n)] #pragma NOINLINE [functionname(1), ..., functionname(n)]For example, to specify inlining of the two subprograms checkstat and getinput, use:
#pragma INLINE checkstat, getinputTo specify that an infrequently called routine should not be inlined when compiling at optimization level 3 or 4, use:
#pragma NOINLINE opendbSee also the related +O[no]inline optimization option.
The compiler gathers information about each function (such as information
about function calls, variables, parameters, and return values) and passes
this information to the optimizer.
#pragma FLOAT_TRAPS_ON { functionname,...functionname } #pragma FLOAT_TRAPS_ON { _ALL }
For example:
#pragma FLOAT_TRAPS_ON xyz,abcinforms the compiler and optimizer that xyz and abc have floating-point traps turned on and therefore LICM optimization should not be performed.
HP_DEFINED_EXTERNAL can improve performance of shared libraries by reducing the overhead of calling shared library routines. You must be very careful using this pragma because incorrect use can result in incorrect and unpredictable behavior. See also the HP-UX Linker and Libraries User's Guide for more information on improving shared library performance.
| WARNING | Do not use this pragma at function definitions, only at function calls. Specifying it at function definitions will result in incorrect behavior. |
#pragma HP_DEFINED_EXTERNALname1[, name2[, ...]]where name1, name2, and so forth are names of functions in shared libraries.
NOTE:Use this pragma only on calls to functions in shared libraries.