HP C/HP-UX Online Help

Return to the Main HP C Online Help page




Optimizing HP C Programs

NOTE: See the Compiling& Running HP C Programs section of the HP C Online Help for a quick reference of all HP C compiler options and pragmas. See the Parallel Options & Pragmas section of the HP C Online Help for detailed descriptions of options and pragmas for threads and parallel programming.

Summary of Major Optimization Levels
Supporting Optimization Options
Enabling Basic Optimization
Enabling Different Levels of Optimization
Changing the Aggressiveness of Optimizations
Enabling Only Conservative Optimizations
Enabling Aggressive Optimizations
Removing Compilation Time Limits When Optimizing
Limiting the Size of Optimized Code
Specifying Maximum Optimization
Combining Optimization Parameters
Summary of Optimization Parameters
Profile-Based Optimization
Controlling Specific Optimizer Features
Using Advanced Optimization Options
Level 1 Optimizations
Level 2 Optimizations
Level 3 Optimizations
Level 4 Optimizations
Guidelines for Using the Optimizer
Optimizer Assumptions
Optimizer Pragmas
Aliasing Options
Improving Shared Library Performance
Improving Compile and Link Times

The HP C optimizer transforms programs so machine resources are used more efficiently. The optimizer can dramatically improve application run-time speed. HP C performs only minimal optimizations unless you specify otherwise. You activate additional optimizations using HP C command-line options and pragmas.

There are four major levels of optimization: levels 1, 2, 3, and 4.Level 4 optimization can produce the fastest executable code. Level 4 is a superset of the other levels.

Additional parameters enable you to control the size of the executable program, compile time, and aggressiveness of the optimizations performed.

Compile time memory and CPU usage increase with each higher level of optimization due to the increasingly complex analysis that must be performed. You can control the trade-offs between compile-time penalties and code performance by choosing the level of optimization you desire.

Generally, the optimizer is not used during code development. It is used when compiling production-level code for benchmarking and general use.

Summary of Major Optimization Levels

Table 7: HP C Major Optimization Options summarizes the major optimization options of HP C:
 

Table 7: HP C Major Optimization Opt ions 
Option Description Benefits 
+O0 (default)Constant folding and simple register assignment.Compiles fastest.
+O1Level 0 optimizations plus instruction scheduling and optimizations that can be performed on small sections of code.Produces faster programs than level 0. Compiles faster than level 2.
+O2 or -OLevel 1 optimizations, plus optimizations performed over entire functions in a single file. Optimizes loops in order to reduce pipeline stalls. Performs scalar replacement, and analysis of data-flow, memory usage, loops and expressions.Can produce faster run-time code than level 1 if programs use loop sextensively. Compiles faster than level 3. Loop-oriented floating point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement.
+O3Level 2 optimizations, plus full optimization across all subprograms within a single file. Includes subprogram inlining.Can produce faster run-time code than level 2 on code that frequently calls small functions. Links faster than level 4.
+O4Level 3 optimizations, plus full optimizations across the entire application program. Includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link-time.Produces faster run-time code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls.

Supporting Optimization Options

Table 8: Other Supporting Optimizations shows optimization options that support the core optimization levels. These optimizations are performed only when specifically invoked. They are available at all optimization levels.
 

Table 8: Other Supporting Optimizations 
Option Description Benefits 
+ESficGenerates object code with fast indirect calls. Only correct for programs not using shared libraries.Run-time code is faster.
+ESlitPlaces string literals and constants defined with the ANSI C const type qualifier into read-only data storage. Storing to constants with this option will cause segmentation violations.Reduces memory requirements and improves run-time speed in multi-user applications. Can improve data-cache utilization.
+I,+PEnables all profile-based optimizations. Uses execution profile data to identify the most frequently executed code paths. Repositions functions, basic blocks, and aids other optimizations according to these frequently executed paths.Improves code locality and cache hit rates. Improves efficiency of other optimizations. Benefits most applications, especially large applications with multiple compilation units. May be used at any optimization level.

Enabling Basic Optimization

To enable basic optimizations, use the -O option (equivalent to+O2), as follows:
cc-O sourcefile.c
Basic optimizations do not change the behavior of ANSI C standard-conforming code. They improve run-time execution time but only increase compile time and link time by a moderate amount.

Enabling Different Levels of Optimization

There may be times when you want more or less optimization than what is provided with the basic -O option.

Level 1 Optimization

To enable level 1 optimization, use the +O1 opt ion, as follows:
cc+O1 sourcefile.c
Level 1 optimization compiles quickly, but still provides some run-time speed.

Level 2 Optimization

To enable level 2 optimization, use the +O2 option, as follows:
cc+O2 sourcefile.c
Level 2 (equivalent to -O) takes more time to compile, but produces greatly improved run-time speed.

Level 3 Optimization

To enable level 3 optimization, use the +O3 option, as follows:
cc+O3 sourcefile.c
Level 3 does full optimization of all subprograms within a single file.

Level 4 Optimization

To enable level 4 optimization, use the +O4 option, as follows:
cc+O4 sourcefile.c
Level 4 can potentially produce the greatest improvements in speed by performing optimizations across multiple object files. Level 4 does optimizations at link time, so compiles will be faster, but links will be longer.

Depending on the size and number of the modules, compiling at level4 can consume a large amount of virtual memory. Level 4 may consume roughly1.25 megabytes per 1000 lines of non-commented source. When you use level4 on a large application, it is a good idea to increase the system swap space. For information on increasing system swap space, see the book Managing Systems and Workgroups.

Changing the Aggressiveness of Optimizations

At each level of optimization, you can control the aggressiveness of the optimizations performed.

Use the +Oconservative option at optimization level 2, 3, or4 if you are not sure if your code conforms to standards. This option provides more safety.

Use the +Oaggressive option at optimization level 2, 3, or4 for best performance when you are willing to risk changes to the behavior of your programs. Using the +Oaggressive option can cause your program to have compilation or run-time problems that require troubleshooting.

Enabling Only Conservative Optimizations

You can enable conservative optimizations at the second, third, or fourth optimization levels by using the +Oconservative option, as follows:
cc+O2 +Oconservative sourcefile.c
or:
cc+O3 +Oconservative sourcefile.c
or:
cc+O4 +Oconservative sourcefile.c
Conservative optimizations are optimizations that do not change the behavior of code, in most cases, even if the code does not conform to standards.

Use the conservative optimizations provided with level 2, 3, and 4 when your code is non-ANSI.

Enabling Aggressive Optimizations

To enable aggressive optimizations at the second, third, or fourth optimization levels, use the +Oaggressive option as follows:
cc+O2 +Oaggressive sourcefile.c
or:
cc+O3 +Oaggressive sourcefile.c
or:
cc+O4 +Oaggressive sourcefile.c
Aggressive optimizations are new optimizations or are optimizations that can change the behavior of programs. These optimizations may do any of the following: Use aggressive optimizations with stable, well-structured, ANSI-conforming code. These types of optimizations give you faster code, but are riskier than the default optimizations.

Removing Compilation Time Limits When Optimizing

You can remove optimization time restrictions at the second, third, or fourth optimization levels by using the +Onolimit option as follows:
cc+O2 +Onolimit sourcefile.c
or:
cc+O3 +Onolimit sourcefile.c
or:
cc+O4 +Onolimit sourcefile.c
By default, the optimizer limits the amount of time spent optimizing largeprograms at levels 2, 3, and 4. Use this option if longer compile times and greater virtual memory use are acceptable because you want additional optimizations to be performed.

Limiting the Size of Optimized Code

You can disable optimizations that expand code size at the second, third, and fourth optimization levels by using the +Osize option, as follows:
cc+O2 +Osize sourcefile.c
or:
cc+O3 +Osize sourcefile.c
or:
cc+O4 +Osize sourcefile.c
Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations.

Use this option if you have limited main memory, swap space, or diskspace.

Specifying Maximum Optimization

To get maximum optimization, use:
cc+Oall
The +Oall option performs the maximum optimization.

Use +Oall with stable, well-structured, ANSI-conforming code. These types of optimizations give you the fastest code, but are riskier than the default optimizations.

You can use +Oall at optimization levels 2, 3, and 4. The default is +Onoall.

The +Oall option by itself (specified without the +O2,+O3, or +O4 options) combines the +O4 +Oaggressive +Onolimitoptions. This combination performs aggressive optimizations with unrestricted compile time at the highest level of optimization.

Combining Optimization Parameters

You can combine optimization parameters that affect code size, compile-time, and the aggressiveness of the optimizations with a level of optimization.

For example, to specify conservative optimizations at level 2 and disablecode-expanding optimizations, use:

cc +O2 +Oconservative +Osize sourcefile.c
+Olimit and +Osize can be used with either +Oaggressiveor +Oconservative.

You cannot use +Oaggressive with +Oconservative.

Summary of Optimization Parameters

Table 9: HP C Optimization Parameters summarizes the HP C optimization parameters:
 

Table 9: HP C Optimization Parameters 
Option What It Does Level of Opt
+O[no]aggressive The +O[no]aggressive option enables optimizations that can result in significant performance improvement, but that can change a program's behavior. These optimizations include newly released optimizations and the optimizations invoked by the following advanced optimization options:

See Controlling Specific Optimizer Featuresfor details about advanced optimization options.

  • +Osignedpointers
  • +Oregionsched
  • +Oentrysched
  • +Onofltacc
  • +Olibcalls
  • +Onoinitcheck
  • +Ovectorize
The default is +Onoaggressive.
2, 3, 4
+O[no]all The +Oall option performs maximum optimization, including aggressive optimizations and optimizations that can significantly increase compile time and memory usage. The default is +Onoall. 4
+O[no]conservative The +O[no]conservative option causes the optimizer to make conservative assumptions about the code when optimizing it. Use +Oconservativewhen conservative assumptions are necessary due to the coding style, aswith non-standard conforming programs. The +Oconservative option relaxes the optimizer's assumptions about the target program. The defaultis +Onoconservative. 2, 3, 4
+O[no]info +Oinfo displays informational messages about the optimization process. This option supports the core optimization levels, and therefore, can be used at levels 0-4. The default is +Onoinfo. 0, 1, 2, 3, 4
+O[no]limit The +Olimit option suppresses optimizations that significantly increase compile-time or that can consume a lot of memory. The +Onolimitoption allows optimizations to be performed regardless of their effecton compile-time or memory usage. The default is +Olimit. 2, 3, 4
+O[no]size The +Osize option suppresses optimizations that significantlyincrease code size. The +Onosize option does not prevent optimizationsthat can increase code size. The default is +Onosize.2, 3, 4
+O[no]clone This option allows the user to turn on[off] the cloning facility of the optimizer. Cloning is on by default. It is mainly provided for users who may see a lot of cloning adversely affecting the performance of their code. If inlining is turned off, cloning is turned off by default. You cannot specify +Onoinline +Oclone. 3, 4
+O[no]memory[=malloc] This option enables[disables] memory optimizations. Specifying malloc in the list will enable[disable] optimizations which consolidate memory allocation procedure calls. This option is disabled by default. It is incompatible with +Oopenmp and +Oparallel, and is ignored when these options are in effect. 3, 4

Profile-Based Optimization

The following topics are described in this section:Profile-based optimization (PBO) is a set of performance-improving codetransformations based on the run-time characteristics of your application.

There are three steps involved in performing this optimization:

  1. Instrumentation - Insert data collection code into the object program.
  2. Data Collection - Run the program with representative data to collectexecution profile statistics.
  3. Optimization - Generate optimized code based on the profile data.
Invoke profile-based optimization through HP C by using any level of optimizationand the +I and +P options on the cc commandline.

When you use PBO, compile times are faster and link times are slowerbecause code generation happens at link time.

Instrumenting the Code

To instrument your program, use the +I option as follows:
cc-Aa +I -O -c sample.c  Compilefor instrumentation.
cc-o sample.exe +I -O sample.o  Linkto make instrumented executable.
The first command line uses the -O option to perform level 2 optimizationand instruments the code. The -c option in the first command linesuppresses linking and creates an intermediate object file called sample.o.The.o file can be used later in the optimization phase, avoidinga second compile.

The second command line uses the -o option to link sample.ointo sample.exe. The +I option instruments sample.exewith data collection code. Note that instrumented programs run slower thannon-instrumented programs. Only use instrumented code to collect statisticsfor profile-based optimization.

Collecting Data for Profiling

To collect execution profile statistics, run your instrumented programwith representative data as follows:
sample.exe< input.file1  Col
lectexecution profile data.
sample.exe< input.file2
This step creates and logs the profile statistics to a file, by defaultcalled flow.data. You can use this data collection file to storethe statistics from multiple test runs of different programs that you mayhave instrumented.

Performing Profile-Based Optimization

To optimize the program based on the previously collected run-time profilestatistics, relink the program as follows:
cc-o sample.exe +P -O sample.o
An alternative to this procedure is to recompile the source file in theoptimization step:
cc-o sample.exe +I -0 sample.c     instrumentation
sample.exe< input.file1            datacollection
cc-o sample.exe +P -O sample.c     optimization

Maintaining Profile Data Files

Profile-based optimization stores execution profile data in a disk file.By default, this file is called flow.data and is located in yourcurrent working directory.

You can override the default name of the profile data file. This isuseful when working on large programs or on projects with many differentprogram files.

You can use the FLOW_DATA environment variable to specify thename of the profile data file with either the +I or +Poptions. You can use the +df command-line option to specify thename of the profile data file with the +P option.

The +df option takes precedence over the FLOW_DATAenvironment variable.

In the following example, the FLOW_DATA environment variableis set to override the flow.data file name. The profile data isstored instead in /users/profiles/prog.data.

% setenv FLOW_DATA /users/profiles/prog.data% cc -Aa -c +I +O3 sample.c% cc -o sample.exe +I +O3 sample.o% sample.exe < input.file1% cc -o sample.exe +P +O3 sample.o
In the next example, the +df option uses /users/profiles/prog.datatooverride the flow.data file name.
% cc -Aa -c +I +O3 sample.c% cc -o sample.exe +I +O3 sample.o% sample.exe < input.file1% mv flow.data /users/profile/prog.data% cc -o sample.exe +df /users/profiles/prog.data +P +O3 sample.o

Maintaining Instrumented and Optimized Program Files

You can maintain both instrumented and optimized versions of a program.You might keep an instrumented version of the program on hand for developmentuse, and several optimized versions on hand for performance testing andprogram distribution.

Care must be taken when maintaining different versions of the executablefile because the instrumented program file name is used as the keyidentifier when storing execution profile data in the data file.

The optimizer must know what this key identifier name is in orderto find the execution profile data. By default, the key identifiername used to retrieve the profile data is the instrumentedprogram file name used to run the program for data collection.

When you optimize a program file and the optimized program file nameis different from the instrumented program file name, you must use the+pgmoption. Specify the instrumented program file name with this option. Theoptimizer uses this value as the key identifier to retrieve executionprofile data.

In the following example, the instrumented program file name is sample.inst.The optimized program file name is sample.opt. The +pgmname option is used to pass the instrumented program name to the optimizer:

% cc -Aa -c +I +O3 sample.c% cc -o sample.inst +I +O3 sample.o% sample.inst < input.file1% cc -o sample.opt +P +O3 +pgm sample.inst sample.o

Profile-Based Optimizati on Notes

When using profile-based optimization, please note the following:For more information on profile-based optimization, see theHP-UX Linkerand Libraries Online User Guide.

+Oprofile, option for Profile Based Optimization

This release of HP C compiler provides the flexibility of choosing to generatePA-RISC machine code (SOMs) directly instead of the
compiler’s intermediate code (ISOMs) during the compilation phase itself.

The existing behavior of the compiler has been to generate intermediatecode when PBO options (+I, +P) are used and the final code generationwill happen during link-phase, unless +Oreusedir= is used. Atthis stage, linker calls ucomp. An obvious disadvantage is, even when asingle file is changed code generation for all other files will happenduring link-phase. This makes the overall compile-link time significantlyhigh. As an enhancement to the current behavior, compiler will generatethe PA-RISC machine code (SOM) whenever the newly introduced PBO optionsare used. This does not require code generation to happen during link-phaseas the compiler itself would have converted the intermediate code (ISOM)into machine code (SOM) by calling ucomp.

The following lists the newly introduced PBO options:

The above new options correspond to (though building SOMs instead of ISOMs):As seen above, the behavior of the new +Oprofile options are equivalentto the existing PBO options. Except that whenever +Oprofile isused compiler calls ucomp to convert intermediate code into machine code.Performing PBO as earlier is not changed. There is no behavior change when+I/+P and any other old options are used in the command line.The cc driver calls ld to generate ISOMs. The options+pgm and -tu will work with the new options.


NOTE 
  • The new options can be used only with -c (compile only), if notthe optimization is performed as earlier.
  • The new options are available only at optimization level below +O4 andat +O4 optimization, +I or +P is used.
  • Mixing of old and new options while optimizing on the same command lineis disabled. For example: Using +Oprofile and +I/+P/+df inthe same command line are incompatible.

Controlling Specific Optimizer Features

Most of the time, specifying optimization level 1, 2, 3, or 4 should provideyou with the control over the optimizer that you need. Additional parametersare provided when you require a finer level of control.

At each level, you can turn on and off specific optimizations usingthe +O[no]optimization option. The optimizationparameter is the name of a specific optimization technique. The optionalprefix[no] disables the specified optimization.

Below is a list of advanced optimizer options, followed by detailed information on each option:

+Olevel=name1[,name2,...nameN]

Optimization levels: 1, 2, 3, 4

Default: All functions are optimized at the level specified by the ordinary+Olevel option.

This option lowers optimization to the specified levelfor oneor more named functions. level can be 0, 1, 2, 3, or 4. The nameparameters are names of functions in the module being compiled. Use thisoption when one or more functions do not optimize well or properly. Itmust be used with an ordinary +Olevel option.

This option works the same as the OPT_LEVEL pragma described under OptimizerCon trol Pragmas . This option overrides the OPT_LEVEL pragma for thespecified functions. As with the pragma, you can only lower the level ofoptimization; you cannot raise it above the level specified in the ordinary+Olevel option. To avoid confusion, it is best to use either thisoption or the OPT_LEVEL pragma rather than both.

Examples

The following command optimizes all functions at level 3, except for thefunctions myfunc1 and myfunc2, which it optimizes atlevel 1.

$ cc +O3 +O1=myfunc1,myfunc2 funcs.c main.c

The following command optimizes all functions at level 2, except forthe functions myfunc1 and myfunc2, which it optimizesat level 0.

$ cc -O +O0=myfunc1,myfunc2 funcs.c main.c

+O[no]autopar

See +O[no]autopar.

+O[no]dataprefetch

Default: +Onodataprefetch

When +Odataprefetch is enabled, the optimizer inserts instructionswithin innermost loops to explicitly prefetch data from memory into thedata cache. Data prefetch instructions will be inserted only for data structuresreferenced within innermost loops using simple loop varying addresses (thatis, in a simple arithmetic progression). It is only available for PA-RISC2.0 targets.

The math library contains special prefetching versions of vector routines.If you have a PA-RISC 2.0 application that contains operations on arrayslarger than 1 megabyte in size, using +Ovectorize in conjunctionwith +Odataprefetch may improve performance substantially.

Use this option for applications that have high data cache miss overhead.

+O[no]dynsel

See +O[no]dynsel.

+O[no]entrysched

Optimization levels: 1, 2, 3, 4

Default: +Onoentrysched

The +Oentrysched option optimizes instruction scheduling ona procedure"s entry and exit sequences. Enabling this option can speedup an application. The option has undefined behavior for applications whichhandle asynchronous interrupts by examining the sigcontext values of callerstack operands. The option affects unwinding in the entry and exit regions.

At optimization level +O2 and higher (using data flow information),save and restore operations become more efficient.

This option can change the behavior of programs that perform stack unwind-basedexception handling or asynchronous interrupt handling. The behavior ofsetjmp()and longjmp() is not affected.

+O[no]extern[=name1,name2,...nameN]

Optimization levels: 0, 1, 2, 3, 4

Default: +Oextern

This option is available in the LP64 data model only.

The +O[no]extern option allows you to specify which accessesto symbols in an executable or shared library (a load module) can be optimized.Use of +Onoextern creates code that cannot be included in a sharedlibrary.Use +Onoextern only to build executables.Only internalsymbols (defined in the load module) can be optimized. If +Onoexternis specified without a name list, the compiler assumes that no symbolsare external to the load module being compiled, and any symbol can be optimized.If +Oextern is specified without a name list, the compiler assumesthat all symbols are external to the load module being compiled and thuscannot be optimized; this is the default.If +Oextern is specifiedwith a name list, the compiler treats the specified symbols as externaleven if +Onoextern without a name list is in effect. The followingexample indicates that foo and bar are to eventuallybe imported from another load module (for example, a shared library); allother functions and data items will not be external, since +Onoexternis specified.

+Oexter


n=foo,bar +Onoextern
When +Onoextern is specified with a name list, the compiler treatsthe specified symbols as internal even if +Oextern without a namelist is in effect. The following example indicates that references to bazand x may be optimized for access in the local load module. Allother symbols will be subject to resolution to another load module since+Oexternis the default.
+Onoextern=baz,x
Use this option to precisely control which symbols' accesses may be optimized.Knowledge of the shared libraries used by an application, or the exportedinterface of a shared library is required.See also, the HP_DEFINED_EXTERNALpragma.The default is +Oextern with no name list.

+O[no]fail_safe

Optimization levels: 1, 2, 3

Default: +Ofail_safe

The +Ofail_safe option allows compilations with internal optimizationerrors to continue by issuing a warning message and restarting the compilationat +O0.

You can use +Onofail_safe at optimization levels 1, 2, 3, or4 when you want the internal optimization errors to abort your build.

This option is disabled when compiling for parallelization.

+O[no]fastaccess

Optimization levels: 0, 1, 2, 3, 4

Default: +Onofastaccess at optimization levels 0, 1, 2 and3, +Ofastaccess at optimization level 4

The +Ofastaccess option optimizes for fast access to globaldata items.

Use +Ofastaccess to improve execution speed at the expenseof longer compile times.

+O[no]fltacc

Optimization levels: 2, 3, 4

The +Onofltacc option allows the compiler to perform floating-pointoptimizations that are algebraically correct but that may result in numericaldifferences. For example, this option may change the order of expressionevaluation as such: If a,b, and c are floating-pointvariables, the expressions (a + b) + c and a + (b + c)may give slightly different results due to rounding. In general, thesedifferences will be insignificant.

The +Onofltacc option also enables the optimizer to generatefused multiply-add (FMA) instructions, the FMPYFADD and FMPYNFADD.These instructions improve performance but occasionally produce resultsthat may differ from results produced by code without FMA instructions.In general, the differences are slight. FMA instructions are only availableon PA-RISC 2.0 systems.

Specifying +Ofltacc disables the generation of FMA instructionsas well as some other floating-point optimizations. Use +Ofltaccif it is important that the compiler evaluate floating-point expressionsas it does in unoptimized code. The +Ofltacc option does not allowany optimizations that change the order of expression evaluation and thereforemay affect the result.

If you are optimizing code at level 2 or higher and do not specify +Onofltaccor +Ofltacc, the optimizer will use FMA instructions, but willnot perform floating-point optimizations that involve expression reorderingor other optimizations that potentially impact numerical stability.

The list below identifies the different actions taken by the optimizeraccording to whether you specify +Ofltacc,+Onofltacc,or neither option.

Optimization        Expression       FMA?Options             Reordering? +O2                 No               Yes+O2 +Ofltacc        No               No+O2 +Onofltacc      Yes&


nbsp;             Yes

+O[no]global_ptrs_unique[=name1,name2,...nameN]

Optimization levels: 2, 3, 4

Default: +Onoglobal_ptrs_unique

Use this option to identify unique global pointers, so that the optimizercan generate more efficient code in the presence of unique pointers, forexample by using copy propagation and common sub-expression elimination.A global pointer is unique if it does not alias with any variable in theentire program.

This option supports a comma-separated list of unique global pointervariable names.

+O[no]initcheck

Optimization levels: 2, 3, 4

Default: unspecified

The initialization checking feature of the optimizer has three possiblestates: on, off, or unspecified. When on (+Oinitcheck), the optimizerinitializes to zero any local, scalar, non-static variables that are uninitializedwith respect to at least one path leading to a use of the variable.

When off (+Onoinitcheck), the optimizer issues warning messageswhen it discovers definitely uninitialized variables, but does not initializethem.

When unspecified, the optimizer initializes to zero any local, scalar,non-static variables that are definitely uninitialized with respect toall paths leading to a use of the variable.

Use +Oinitcheck to look for variables in a program that maynot be initialized.

+O[no]inline[=name1, name2,...nameN]

Optimization levels: 3, 4Default: +Oinline

When +Oinline is specified without a name list, anyfunction can be inlined. For inlining to be successful, follow prototypedefinitions for function calls in the appropriate header file.

When specified with a name list, the named functions are importantcandidates for inlining. For example, saying

+Oinline=foo,bar +Onoinline
indicates that inlining be strongly considered for foo and bar;all other routines will not be considered for inlining, since +Onoinlineis given.

When this option is disabled with a name list, the compiler will notconsider the specified routines as candidates for inlining. For example,saying

+Onoinline=baz,x
indicates that inlining should not be considered for baz and x;all other routines will be considered for inlining, since +Oinlineis the default.

The +Onoinline disables inlining for all functions or a specificlist of functions.

Use this option when you need to precisely control which subprogramsare inlined.

+Oinline_budget=n

Optimization levels: 3, 4

Default: +Oinline_budget=100

where n is an integer in the range 1 - 1000000 that specifiesthe level of aggressiveness, as follows:

The +Onolimit and +Osize options also affect inlining.Specifying the +Onolimit option has the same effect as specifying+Oinline_budget=200.The +Osize option has the same effect as +Oinline_budget=1.

Note, however, that the +Oinline_budget=n option takesprecedence over both of these options. This means that you can overridethe effect of +Onolimit or +Osize option on inliningby specifying the +Oinline_budget=n option on the samecompile line.

+O[no]libcalls

Optimization levels: 0, 1, 2, 3, 4

Default: +Onolibcalls

Use the +Olibcalls option to increase the runtime performance of code which calls standard library routines in simple contexts. The +Olibcallsoption expands the following library calls inline:

Inlining will take place only if the function call follows the prototypedefinition the appropriate header file. Fast subprogram linkage is alsoemitted to tuned millicode versions of the math library functions sin, cos, tan, atan 2, log, pow, asin, acos, atan, exp, and log10. (See the HP-UX Floating-Point Guide for the most up-to-date listingof the math library functions.) The calling code must not expect to access ERRNO after the function's return.

A single call to printf() may be replaced by a series of callsto putchar(). Calls to sprintf() and strlen()may be optimized more effectively, including elimination of some callsproducing unused results. Calls to setjmp() and longjmp()may be replaced by their equivalents _setjmp() and _longjmp(), which do not manipulate the process"s signal mask.

Use +Olibcalls to improve the performance of selected libraryroutines only when you are not performing error checking for these routines.

Using +Olibcalls with +Ofltacc will give differentfloating point calculation results than those given using +Ofltaccwithout +Olibcalls.

The +Olibcalls option replaces the obsolete -J option.

+Olit=[all|const|none]

The +Olit option specifies the type of data items placed in the read-only data section.

+Olit can take the values all and none.

+Olit=all places all string variables and all const-qualified variables that do not require load-time or run-time initialization in the read-only data section.

+Olit=const places all string literals appearing in a context where const char * is legal, and all const-qualified variables that do not require load-time or run-time initialization in the read-only data section.

If +Olit=none is specified, no constants are placed in the read-only data section.

+O[no]loop_block

See +O[no]loop_block.

+O[no]loop_transform

Optimization levels: 3, 4

Default: +Oloop_transform

The +O[no]loop_transform option enables [disables] transformationof eligible loops for improved cache performance. The most important transformationis the reordering of nested loops to make the inner loop unit stride, resultingin fewer cache misses.

+Onoloop_transform may be a helpful option if you experienceany problem while using+Oparallel.

+O[no]loop_unroll[=unroll factor]

Optimization levels: 2, 3, 4

Default: +Oloop_unroll

The +Oloop_unroll option turns on loop unrolling. When youuse +Oloop_unroll, you can also use the unroll factor to controlthe code expansion. The default unroll factor is 4, that is, fourcopies of the loop body. By experimenting with different factors, you mayimprove the performance of your program.

+O[no]loop_unroll_jam

See +O[no]loop_unroll_jam.

+O[no]moveflops

Optimization levels: 2, 3, 4

Default: +Omoveflops

Allows [or disallows] moving conditional floating point instructionsout of loops. The +Onomoveflops option replaces the obsolete +OEoption. The behavior of floating-point exception handling may be alteredby this option.

Use +Onomoveflops if floating-point traps are enabled and youdo not want the behavior of floating-point exceptions to be altered bythe relocation of floating-point instructions.

+O[no]multiprocessor

Optimization levels2: 2, 3, 4

Default: +Onomultiprocessor

If +Omultiprocessor is specified, the compiler performs optimimizationsappropriate for executables or shared libraries to run in several differentprocesses on multiprocessor machines.

If you enable this option inappropriately (for example, for an executableonly run a uniprocessor system), performance may be degraded.

+O[no]parallel

See +O[no]parallel.

+O[no]parallel_env

Need to add information on this option.

+O[no]parmsoverlap

Optimization levels: 2, 3, 4

Default: +Oparmsoverlap

The +Oparmsoverlap option optimizes with the assumption thatthe actual arguments of function calls overlap in memory.

The +Onoparmsoverlap option replaces the obsolete +Om1option.

Use +Onoparmsoverlap if C programs have been literally translatedfrom FORTRAN programs.

+O[no]pipeline

Optimization levels: 2, 3, 4

Default: +Opipeline

Enables [or disables] software pipelining. The +Onopipelineoption replaces the obsolete +Os option.

Use +Onopipeline to conserve code space.

+O[no]procelim

Optimization levels: 0, 1, 2, 3, 4

Default: +Onoprocelim at levels 0-3, +Oprocelim atlevel 4

When +Oprocelim is specified, procedures that are not referencedby the application are eliminated from the output executable file. The+Oprocelimoption reduces the size of the executable file, especially when optimizingat levels 3 and 4, at which inlining may have removed all of the callsto some routines.

When you specify +Onoprocelim, procedures that are not referencedby the application are not eliminated from the output executable file.

The default is +Onoprocelim at levels 0-3, and +Oprocelimat level 4.

If the +Oall option is enabled, the +Oprocelim optionis enabled.

+O[no]promote_indirect_calls

Optimization levels: 3, 4 and profile-based optimization

Default: +Onopromote_indirect_calls

This option uses profile data from profile-based optimization and otherinformation to determine the most likely target of indirect calls and promotesthem to direct calls. In all cases the optimized code tests to make surethe direct call is being taken & if not, executes the indirect call.If +Oinline is in effect, the optimizer may also inline the promoted calls.This option can only be used with profile-based optimization, describedin Profile-Based Optimization .

The optimizer tries to determine the most likely target of indirectcalls. If the profile data is incomplete or ambiguous, the optimizer maynot select the best target. If this happens, your code's performance maydecrease.

At +O3, this option is only effective if indirect calls from functionswithin a file are mostly to target functions within the same file. Thisis because +O3 optimizes only within a file whereas +O4 optimizes acrossfiles.

+O[no]ptrs_ansi

Optimization levels: 2, 3, 4

Default: +Onoptrs_ansi

Use +Optrs_ansi to make the following two assumptions, whichthe more aggressive+Optrs_strongly_typed does not make:

When both are specified, +Optrs_ansi takes precedence over +Optrs_strongly_typed.
  • For more information about type aliasing see AliasingOptions .

  • +O[no]ptrs_strongly_typed

    Optimization levels: 2, 3, 4

    Default: +Onoptrs_strongly_typed

    Use +Optrs_strongly_typed when pointers are type-safe. Theoptimizer can use this information to generate more efficient code.

    Type-safe (that is, strongly-typed) pointers are pointers to a specifictype that only point to objects of that type, and not to objects of anyother type. For example, a pointer declared as a pointer to an intis considered type-safe if that pointer points to an object only of typeint,but not to objects of any other type.

    Based on the type-safe concept, a set of groups are built based on objecttypes. A given group includes all the objects of the same type.

    The term type-inferred aliasing is a concept which means anypointer of a type in a given group (of objects of the same type) can onlypoint to any object from the same group; it can not point to a typed objectfrom any other group.

    For more information about type aliasing see AliasingOptions .

    Type casting to a different type violates type-inferring aliasing rules.See Example 2 below.

    Dynamic casting is allowed. See Example 3 below.

    For more details, see Aliasing Options .

    Example 1: How Data Types Interact

    The optimizer generally spills all global data from registers to memorybeforeany modification to global variables or any loads through pointers. However,you can instruct the optimizer on how data types interact so it can generatemore efficient code.

    If you have the following:

    1  int *p;2  float *q;3  int a,b,c;4  float d,e,f;5  foo()6  {7    for (i=1;i<10;i++) {8              d=e9             *p=b;10             e=d+f;11             f=*q;12   }13 }
    With +Onoptrs_strongly_typed turned on, the pointers pand q will be assumed to be disjoint because the types they pointto are different types. Without type-inferred aliasing, *p isassumed to invalidate all the definitions. So, the use of d andfon line 10 have to be loaded from memory. With type-inferred aliasing,the optimizer can propagate the copy of d and f and thusavoid two loads and two stores.

    This option can be used for any application involving the use of pointers,where those pointers are type safe. To specify when a subset of types aretype-safe, use the [NO]PTRS_STRONGLY_TYPED pragma. The compilerissues warnings for any incompatible pointer assignments that may violatethe type-inferred aliasing rules discussed in AliasingOptions .

    Example 2: Unsafe Type Cast

    Any type cast to a different type violates type-inferred aliasing rules.Do not use +Optrs_strongly_typed with code that has these unsafetype casts. Use the [NO]PTRS_STRONGLY_TYPED pragma to preventthe application of type-inferred aliasing to the unsafe type casts.

    struct foo{  int a;  int b;} *P; struct bar {  float a;  int b;  float c;} *q; P = (struct foo *) q;  /* Incompatible pointer assignment  through type cast */
    Example 3: Generally Applying Type Aliasing

    Dynamic cast is allowed with +Optrs_strongly_typed or +Optrs_ansi.A pointer dereference is called dynamic cast if a cast is applied on thepointer to a different type.

    In the example below, type-inferred aliasing is applied onPgenerally, not just to the particular dereference. Type-aliasing will beapplied to any other dereferences of P.

    struct s {  short int a;  short int b;  int c;} *P;* (int *)P = 0;
    For more information about type aliasing, see AliasingOptions .

    +O[no]ptrs_to_globals[=name1, name2, ...nameN]

    Optimization levels: 2, 3, 4

    Default: +Optrs_to_globals

    By default global variables are conservatively assumed to be modifiedanywhere in the program. Use this option to specify which global variablesare not modified through pointers, so that the optimizer can make yourprogram run more efficiently by incorporating copy propagation and commonsub-expression elimination.

    This option can be used to specify all global variables as not modifiedvia pointers, or to specify a comma-separated list of global variablesas not modified via pointers.

    Note that the on state for this option disables some optimizations,such as aggressive optim izations on the program"s global symbols.

    For example, use the command-line option +Onoptrs_to_globals=a,b,cto specify global variables a,b, and c as notbeing accessed through pointers. No pointer can access these global variables.The optimizer will perform copy propagation and constant folding becausestoring to *p will not modify a or b.

    int a, b, c;float *p;foo(){   a = 10;   b = 20;  *p = 1.0;   c = a + b;}
    If all global variables are unique, use the following option without listingthe global variables:
    +Onoptrs_to_globals
    In the example below, the address of b is taken. This means bcan be accessed indirectly through the pointer. You can still use+Onoptrs_to_globalsas: +Onoptrs_to_globals +Optrs_to_globals=b.
    long b,c;int *p; p=b; foo()
    For more information about type aliasing see AliasingOptions .

    +O[no]regionsched

    Optimization levels: 2, 3, 4

    Default: +Onoregionsched

    Applies aggressive scheduling techniques to move instructions acrossbranches. This option is incompatible with the linker -z option.If used with -z, it may cause a SIGSEGV error at run-time.

    Use +Oregionsched to improve application run-time speed. Compilationtime may increase.

    +Oreusedir=directory

    Optimization levels: 4 or with profile-based optimization

    Default: no reuse of object files

    This option specifies a directory where the linker can save object filescreated from intermediate object files when using +O4 or profile-basedoptimization. It reduces link time by not recompiling intermediate objectfiles when they don't need to be.

    When you compile with +I, +P, or +O4, the compiler generates intermediatecode in the object file. Otherwise, the compiler generates regular objectcode in the object file. When you link, the linker first compiles the intermediateobject code to regular object code, then links the object code. With thisoption you can reduce link time on subsequent links by avoiding recompilingintermediate object files that have already been compiled to regular objectcode and have not changed.

    Note that when you do change a source file or command line options andrecompile, a new intermediate object file will be created and compiledto regular object code in the specified directory. The previous objectfile in the directory will not be removed. You should periodically removethis directory since old object files cannot be reused and will not beautomatically removed.

    +O[no]regreassoc

    Optimization levels: 2, 3, 4

    Default: +Oregreassoc

    If disabled, this option turns off register reassociation.

    Use +Onoregreassoc to disable register reassociation if thisoptimization hinders the optimized application performance.

    +O[no]report=[report_type]

    See +O[no]report[=report_type].

    +O[no]sharedgra

    See +O[no]sharedgra.

    +O[no]sideeffects[=name1, name2, ...nameN]

    Optimization levels: 2, 3, 4

    Default: assume all subprograms have side effects

    Assume that subprograms specified in the name list might modifyglobal variables. Therefore, when +Osideeffects is enabled theoptimizer limits global variable optimization.

    The default is to assume that all subprograms have side effects unlessthe optimizer can determine that there are none.

    Use +Onosideeffects if you know that the named functions donot modify global variables and you wish to achieve the best possible performance.

    +O[no]signedpointers

    Optimiz ation levels: 0, 1, 2, 3, 4

    Default: +Onosignedpointers

    Perform [or do not perform] optimizations related to treating pointersas signed quantities. Applications that allocate shared memory and thatcompare a pointer to shared memory with a pointer to private memory mayrun incorrectly if this optimization is enabled.

    Use +Osignedpointers to improve application run-time speed.

    +O[no]static_prediction

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onostatic_prediction

    +Ostatic_prediction turns on static branch prediction for PA-RISC2.0 targets.

    PA-RISC 2.0 has two means of predicting which way conditional brancheswill go: dynamic branch prediction and static branch prediction. Dynamicbranch prediction uses a hardware history mechanism to predict future executionsof a branch from its last three executions. It is transparent and quiteeffective unless the hardware buffers involved are overwhelmed by a largeprogram with poor locality.

    With static branch prediction on, each branch is predicted based onimplicit hints encoded in the branch instruction itself; the dynamic branchprediction is not used.

    Static branch prediction"s role is to handle large codes with poor localityfor which the small dynamic hardware facility will prove inadequate.

    Use +Ostatic_prediction to better optimize large programs withpoor instruction locality, such as operating system and database code.

    Use this option only when using PBO, as an amplifier to +P.It is allowed but silently ignored with +I, so makefiles neednot change between the +I and +P phases.

    +O[no]vectorize

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onovectorize

    +Ovectorize allows the compiler to replace certain loops withcalls to vector routines.

    Use +Ovectorize to increase the execution speed of loops.

    When +Onovectorize is specified, loops are not replaced withcalls to vector routines.

    Because the +Ovectorize option may change the order of operationsin an application, it may also change the results of those operations slightly.See theHP-UX Floating-Point Guide for details.

    The math library contains special prefetching versions of vector routines.If you have a PA2.0 application that contains operations on very largearrays (larger than 1 megabyte in size), using +Ovectorize inconjunction with +Odataprefetch may improve performance substantially.

    You may use +Ovectorize at levels 3 and 4. +Onovectorizeis also included as part of +Oaggressive and +Oall.

    This option is only valid for PA-RISC 1.1 and 2.0 systems.

    +O[no]volatile

    Optimization levels: 1, 2, 3, 4

    Default: +Onovolatile

    The +Ovolatile option implies that memory references to globalvariables cannot be removed during optimization.

    The +Onovolatile option implies that all globals are not ofvolatileclass. This means that references to global variablescan be removedduring optimization.

    The +Ovolatile option replaces the obsolete +OV option.

    Use this option to control the volatile semantics for all globalvariables.

    +O[no]whole_program_mode

    Optimization level: 4

    Default: +Onowhole_program_mode

    The +Owhole_program_mode option enables the assertion thatonly the files that are compiled with this option directly reference anyglobal variables and procedures that are defined in these files. In otherwords, this option asserts that there are no unseen accesses to the globals.

    When this assertion is in effect, the optimizer can hold global variablesin registers longer and delete inlined or cloned global procedures.

    All files compiled with +Owhole_program_mode must also be compiledwith +O4. If any of the fi les were compiled with +O4but were not compiled with +Owhole_program_mode, the linker disablesthe assertion for all files in the program.

    The default, +Onowhole_program_mode, disables the assertion.

    Use this option to increase performance speed, but only when you arecertain that only the files compiled with +Owhole_program_modedirectly access any globals that are defined in these files.

    Using Advanced Optimization Options

    Several advanced optimization options can be specified on the same commandline. For example, the following command line specifies aggressive level3 optimizations with unrestricted compile time, disables software pipelining,and disables moving conditional floating-point instructions out of a loop:
    cc +O3 +Oaggressive +Onolimit +Onomoveflops +Onopipeline \   sourcefile.c
    Specify the level of optimization first (+O1,+O2, +O3,or +O4), followed by any +O[no]optimization options.

    Level 1 Optimization Modules

    The level 1 optimization modules are:The examples in this section are shown at the source code level whereverpossible. Transformations that cannot be shown at the source level areshown in assembly language. See Table 10: Descriptionsof Assembly Language Instructions for descriptions of the assemblylanguage instructions used.

    Branch Optimization

    The branch optimization module traverses the procedure and transforms branchinstruction sequences into more efficient sequences where possible. Examplesof possible transformations are:
              if(a) {            .            .            .             statement 1          } else {             goto L1;          }          statement 2      L1:
    becomes:
              if(!a) {              goto L1;          }          statement 1          statement 2 
    
    
    ;     L1:

    Dead Code Elimination

    The dead code elimination module removes unreachable code that is neverexecuted.

    For example, the code:

         if(0) {        a = 1;     } else {        a = 2;
    becomes:
         a = 2;

    Faster Register Allocation

    The faster register allocation module, used with unoptimized code, analyzesregister use faster than the coloring register allocator (a level 2 module).

    This module performs the following:

    Instruction Scheduler

    The instruction scheduler module performs the following:For example, the code:
         LDW     -52(0,30),r1     ADDI    3,r1,r31    ;interlock with load of r1     LDI     10,r19
    becomes:
         LDW     -52(0,sp),r1     LDI     10,r19     ADDI    3,r1,r31    ;use of r1 is now separated from load

    Table 10: Descriptions of Assembly Language Instructions
    Instruction Description 
    LDWoffset(sr, base), targetLoads a word from memory into registertarget.
    ADDIconst, reg, targetAdds the constant const to the contents of register regand puts the result in register target.
    LDIconst, targetLoads the constant const into register target.
    LDOconst(reg),targetAdds the constant const to the contents of register regand puts the result in register target.
    ANDreg1, reg2, targetPerforms a bitwise AND of the contents of registers reg1 andreg2and puts the result in register target.
    COMIBcond const, reg, labCompares the constant const to the contents of register regand branches to label lab if the condition cond is true.
    BBcond reg,num,labTests the bit number num in the contents of register regand branches to label lab if the condition cond is true.
    COPYreg, targetCopies the contents of register reg to register target.
    STWreg, offset(sr, base)Store the word in register reg to memory.

    Peephole Optimizations

    The peephole optimization process involves looking at small windows ofmachine code for optimization opportunities. Wherever possible, the peepholeoptimizer replaces assembly language instruction sequences with faster(usually shorter) sequences, and removes redundant register loads and stores.

    For example, the code:

         LDI     32,r3     AND     r1,r3,r2  
    
    
    ;   COMIB,= 0,r2,L1
    becomes:
         BB,>=   r1, 26, L1

    Level 2 Optimization Modules

    Level 2 performs optimizations within each procedure. At level 2, the optimizerperforms all optimizations performed at the prior level, with the followingadditions:The examples in this section are shown at the source code level whereverpossible. Transformations that cannot be shown at the source level areshown in assembly language.

    Coloring Register Allocation

    The name of this optimization comes from the similarity to map coloringalgorithms in graph theory. This optimization determines when and how longcommonly used variables and expressions occupy a register. It minimizesthe number of references to memory (loads and stores) a code segment makes.This can improve run-time speed.

    You can help the optimizer understand when certain variables are heavilyused within a function by declaring these variables with the registerqualifier. The first 10 register qualified variables encounteredin the source are honored. You should pick the ten most important variablesto be most effective.

    The coloring register allocator may override your choices and promoteto a register a variable not declared register over one that is,based on estimated speed improvements.

    The following code shows the type of optimization the coloring registerallocation module performs. The code:

         LDI     2,r104     COPY    r104,r103     LDO     5(r103),r106     COPY    r106,r105     LDO     10(r105),r107
    becomes:
         LDI     2,r25     LDO     5(r25),r26     LDO     10(r26),r31

    Induction Variables and Strength Reduction

    The induction variables and strength reduction module removes expressionsthat are linear functions of a loop counter and replaces each of them witha variable that contains the value of the function. Variables of the samelinear function are computed only once. This module also simplifies thefunction by replacing multiplication instructions with addition instructionswherever possible.

    For example, the code:

         for (i=0; i<25; i++) {          r[i] = i * k;     }
    becomes:
         t1 = 0;     for (i=0; i<25; i++) {          r[i] = t1;          t1 += k;     }

    Local and Global Common Subexpression Elimination

    The common subexpression elimination module identifies expressions thatappear more than once and have the same result, computes the result, andsubstitutes the result for each occurrence of the expression. The typesof subexpression include instructions that load values from memory, aswell as arithmetic evaluation.

    For example, the code:

         a = x + y + z;     b = x 
    
    
    + y + w;
    becomes:
         t1 = x + y;     a = t1 + z;     b = t1 + w;

    Constant Folding and Propagation

    Constant folding computes the value of a constant expression at compiletime. For example:
    A = 10;B = A + 5;C = 4 * B;
    can be replaced by:
    A = 10;B = 15;C = 60;

    Loop Invariant Code Motion

    The loop invariant code motion module recognizes instructions inside aloop whose results do not change and moves them outside the loop. Thisensures that the invariant code is only executed once.

    For example, the code:

         x = z;     for(i=0; i<10; i++)     {          a[i] = 4 * x + i;     }
    becomes:
         x = z;     t1 = 4 * x;     for(i=0; i<10; i++)     {          a[i] = t1 + i;     }

    Store/Copy Optimization

    Where possible, the store/copy optimization module substitutes registersfor memory locations, by replacing store instructions with copy instructionsand deleting load instructions.

    For example, the following HP C code:

         a = x + 23;
    where a is a local variable.
         return a;
    produces the following code for the unoptimized case:
         LDO     23(r26),r1     STW     r1,-52(0,sp)     LDW     -52(0,sp),ret0
    and this code for the optimized case:
         LDO     23(r26),ret0

    Unused Definition Elimination

    The unused definition elimination module removes unused memory locationand register definitions. These definitions are often a result of transformationsmade by other optimization modules.

    For example, the function:

         f(int x)     {          int a,b,c:           a = 1;          b = 2;          c = x * b;          return c;     }
    becomes:
         f(int x)     {          int a,b,c;           b = 2;          c = x * b;          return c;     }

    Software Pipelining

    Software pipelining is a code transformation that optimizes program loops.It rearranges the order in which instructions are executed in a loop. Itgenerates code that overlaps operations from different loop iterations.Software pipelining is useful for loops that contain arithmetic operationson floats and doubles.

    The goal of this optimization is to avoid CPU stalls due to memory orhardware pipeline latencies. The software pipelining transformation addscode before and after the loop to achieve a high degree of optimizationwithin the loop.

    Example

    The following pseudo-code fragment shows a loop before and after the softwarepipelining optimization. Four significant things happen:The following is a C for loop:
    #define SIZ 10000float x[SIZ], y[SIZ]; \*Software pipelining works with*\int i;                \*floats and doubles.           *\init();for (i = 0;i<= SIZ;i++);        {        x[i] =x[i] / y[i] + 4.00        }
    When this loop is compiled with software pipelining, the optimization canbe expressed in pseudo-code as follows:
    R1 = 0;                Initializearray index.
    R2 = 4.0;              Loadconstant value.
    R3 = Y[0];             Loadfirst Y value.
    R4 = X[0];             Loadfirst X value.
    R5 = R4 / R3;          Performdivision on first element:n = X[0] / Y[0].
     
    do {                   Beginloop.
          R6 = R1;         Savecurrent array index.
          R1++;            Incrementarray index.
          R7 = X[R1];      Loadcurrent X value.
          R8 = Y[R1];      Loadcurrent Y value.
          R9 = R5 + R2;    Performaddition on prior row:X[i] = n + 4.0.
          R10 = R7 / R8;   Performdivision on current row:m = X[i+1] / Y[i+1].
          X[R6] = R9;      Saveresult of operations on prior row.
     
          R6 = R1;         Savecurrent array index.
          R1++;            Incrementarray index.
          R4 = X[R1];      Loadnext X value.
          R3 = Y[R1];      Loadnext Y value.
          R11 = R10 + R2;  Performaddition on current row:X[i+1] = m + 4
          R5 = R4 / R3;    Performdivision on next row:n =  X[i+2] / Y[i+2]
          X[R6] = R11      Saveresult of operations on current row.
    } while (R1 <= 100);   Endloop.
     
    R9 = R5 + R2;          Performaddition on last row:X[i+2] = n + 4
    X[R6] = R9;            Saveresult of operations on last row.
    This transformation stores intermediate results of the division instructionsin unique registers (noted as n and m). These registers arenot referenced until several instructions after the division operations.This decreases the possibility that the long latency period of the divisioninstructions will stall the instruction pipeline and cause processing delays.

    Prerequisites of Pipelining

    Software pipelining is attempted on a loop that meets the following criteria:This optimization produces slightly larger program files and increasescompile time. It is most beneficial in programs containing loops that areexecuted a large number of times. This optimization is not recommendedfor loops that are executed only a small number of times.

    Use the +Onopipeline option with the +O2,+O3,or +O4 option to suppress software pipelining if program sizeis more important than execution speed. This will perform level two optimization,but disable software pipelining.

    Register Reassociation

    Array references often require one or more instructions to compute thevirtual memory address of the array element specified by the subscriptexpression. The register reassociation optimization implemented in thePA-RISC compilers tries to reduce the cost of computing the virtual memoryaddress expression for array references found in loops.

    Within loops, the virtual memory address expression can be rearrangedand separated into a loop varying term and a loop invariant term. Loopvarying terms are those items whose values may change from one iterationof the loop to another. Loop invariant terms are those items whosevalues are constant throughout all iterations of the loop. The loopvarying term corresponds to the difference in the virtual memory addressassociated with a particular array reference from one iteration of theloop to the next.

    The register reassociation optimization dedicates a register to trackthe value of the virtual memory address expression for one or more arrayreferences in a loop and updates the register appropriately in each iterationof a loop.

    The register is initialized outside the loop to the loop invariant portionof the virtual memory address expression and the register is incrementedor decremented within the loop by the loop variant portion of the virtualmemory address expression. On PA-RISC, the update of such a dedicated registercan often be performed for free using the base-register modification capabilityof load and store instructions.

    The net result is that array references in loops are converted intoequivalent but more efficient pointer dereferences.

    For example:

    int a[10][20][30]; void example (void){  int i, j, k;   for (k = 0; k < 10; k++)    for (j = 0; j < 10; j++)      for (i = 0; i < 10; i++)      {          a[i][j][k] = 1;      }}
    after register reassociation is applied to the innermost loop becomes:
    int a[10][20][30]; void example (void){  int i, j, k;  register int (*p)[20][30];   for (k = 0; k < 10; k++)    for (j = 0; j < 10; j++)      for (p = (int (*)[20][30]) a[0][j][k], i = 0; i < 10; i++)      {          *(p++[0][0]) = 1;      }}
    In the above example, the compiler-generated temporary register variable,p,strides through the array a in the innermost loop. This registerpointer variable is initialized outside the innermost loop and auto-incrementedwithin the innermost loop as a side-effect of the pointer dereference.

    Register reassociation can often enable another loop optimization. Afterperforming the register reassociation optimization, the loop variable maybe needed only to control the iteration count of the loop. If this is case,the original loop variable can be eliminated altogether by using the PA-RISCADDIBand ADDB machine instructions to control the loop iteration count.

    Level 3 Optimizations

    Level 3 optimization includes level 2 optimizations, plus full optimizationacross all subprograms within a single file. Lev el 3 also inlines certainsubprograms within the input file. Use +O3 to get level 3 optimization.

    Level 3 optimization produces faster run-time code than level 2 on codethat frequently calls small functions within a file. Level 3 links fasterthan level 4.

    Inlining within a Single Source File

    Inlining substitutes functions calls with copies of the function"s objectcode. Only functions that meet the optimizer"s criteria are inlined. Thismay result in slightly larger executable files. However, this increasein size is offset by the elimination of time-consuming procedure callsand procedure returns.

    Example of Inlining

    The following is an example of inlining at the source code level. Beforeinlining, the source file looks like this:
    /* Return the greatest common divisor of two positive integers,  *//* int1 and int2, computed using Euclid"s algorithm.  (Return 0  *//* if either is not positive.)                                   */int gcd(int1,int2)  int int1;  int int2;{  int inttemp;     if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {        return(0);    }    do {        if ( int1 < int2 ) {            inttemp = int1;            int1    = int2;            int2    = inttemp;        }        int1 = int1 - int2;    } while (int1 > 0);    return(int2);} main(){  int xval,yval,gcdxy;    /* statements before call to gcd */    gcdxy = gcd(xval,yval);    /* statements after call to gcd */}
    After inlining, the source file looks like this:
    main(){  int xval,yval,gcdxy;    /* statements before inlined version of gcd */    {      int int1;      int int2;         int1 = xval;        int2 = yval;        {          int inttemp;             if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {                gcdxy = ( 0 );                goto AA003;            }            do {                if ( int1 < int2 ) {                    inttemp = int1;                    int1    = int2;                    int2    = inttemp;                }                int1 = int1 - int2;            } while ( int1 > 0 );            gcdxy = ( int2 );        }&n
    
    
    bsp;   }AA003 : ;    /* statements after inlined version of gcd */}

    Level 4 Optimizations

    Level 4 performs optimizations across all files in a program. At level4, all optimizations of the prior levels are performed. Two additionaloptimizations are performed:Interprocedural global optimizations across all files within a programsearches across function boundaries to produce better and faster code sequences.Normally, global optimizations are performed within individual functionsor source code files. Interprocedural optimizations look at function interactionswithin a program and transform particular code sequences into faster code.Since information about every function within a program is required, thislevel of optimization must be performed at link time.

    Inlining Across Multiple Files

    Inlining at Level 4 is performed across all procedures within the program.Inlining at level 3 is done within one file.

    Inlining substitutes function calls with copies of the function"s objectcode. Only functions that meet the optimizer"s criteria are inlined. Thismay result in slightly larger executable files. However, this increasein size is offset by the elimination of time-consuming procedure callsand procedure returns.

    Global and Static Variable Optimization

    Global and static variable optimizations look for ways to reduce the numberof instructions required for accessing global and static variables. Thecompiler normally generates two machine instructions when referencing globalvariables. Depending on the locality of the global variables, single machineinstructions may sometimes be used to access these variables. The linkerrearranges the storage location of global and static data to increase thenumber of variables that can be referenced by single instructions.

    Global Variable Optimization Coding Standards

    Since this optimization rearranges the location and data alignment of globalvariables, avoid the following programming practices:

    Guidelines for Using the Optimizer

    The following guidelines assist in effectively using and and writing efficientHP C programs.

    Optimizer Assumptions

    During optimization, the compiler gathers information about the use ofvariables and passes this information to the optimizer. The optimizer usesthis information to ensure that every code transformation maintains thecorrectness of the program, at least to the extent that the original unoptimizedprogram is correct.

    When gathering this information, the HP C compiler makes the followingassumption: while inside a function, the only variables that can be accessedindirectly through a pointer or by another function call are:

    Optimizer Pragmas

    Pragmas give you the ability to:Pragmas cannot cross line boundaries and the word pragma mustbe in lowercase letters. Optimizer pragmas may not appear inside a function.

    Optimizer Control Pragmas

    The OPTIMIZE and OPT_LEVEL pragmas control which functionsare optimized, and which set of optimizations are performed. You can placethese pragmas before any function definitions and they override any previouspragma. These pragmas cannot raise the optimization level above the levelspecified in the command line.

    OPT_LEVEL 0, 1, and 2 provide more control over optimizationthan the +O1 and +O2 compiler options. You use thesepragmas to raise or lower optimization at a function level inside the sourcefile. Whereas, the compiler opt ions can only be used for an entire sourcefile. (OPT_LEVEL 3 and 4 can only be used at the beginningof the source file.)

    Table 11: Optimization Level Precedence showsthe possible combinations of options and pragmas and the resulting optimizationlevels. The level at which a function will be optimized is the lower ofthe two values specified by the command line optimization level and theoptimization pragma in force.
     

    Table 11: Optimization Level Precedence 
    Command-line Optimization Level #Pragma OPT_LEVEL Resulting OPT_LEVEL 
    noneOFF0
    none10
    none20
    +O1OFF0
    +O111
    +O121
    +O131
    +O141
    +O2OFF0
    +O211
    +O222
    +O232
    +O242
    +O3OFF0
    +O311
    +O322
    +O333
    +O343
    +O4OFF0
    +O411
    +O422
    +O433
    +O444

    The values of OPTIMIZE and OPT_LEVEL are summarizedin Table 12: Optimizer Control Pragmas
     

    Table 12: Optimizer Control Pragmas 
    Pragma Description 
    #pragma OPTIMIZE ONTurns optimization on.
    #pragma OPTIMIZE OFFTurns optimization off.
    #pragma OPT_LEVEL 1Optimize only within small blocks of code
    #pragma OPT_LEVEL 2Optimize within each procedure.
    #pragma OPT_LEVEL 3Optimize across all procedures within a source file.
    #pragma OPT_LEVEL 4Optimize across all procedures within a program.

    Inlining Pragmas

    When INLINE is specified without a functionname, any functioncan be inlined. When specified with functionname(s), these functionsare candidates for inlining.

    The NOINLINE pragma disables inlining for all functions orspecified functionname(s).

    The syntax for performing inlining is:

    #pragma INLINE [functionname(1), ..., functionname(n)]#pragma NOINLINE [functionname(1), ..., functionname(n)]
    For example, to specify inlining of the two subprograms checkstatand getinput, use:
    #pragma INLINE checkstat, getinput
    To specify that an infrequently called routine should not be inlined whencompiling at optimization level 3 or 4, use:
    #pragma NOINLINE opendb
    See also the related +O[no]inline optimization option.

    Alias Pragmas

  • NO_SIDE_EFFECTS Pragma
  • ALLOCS_NEW_MEMORY pragma
  • FLOAT_TRAPS_ON pragma

  • [NO]PTRS_STRONGLY_TYPED Pragma

    The compiler gathers information about each function (such as informationabout function calls, variables, parameters, and return values) and passesthis information t o the optimizer. TheNO_SIDE_EFFECTS and ALLOCS_NEW_MEMORYpragma tell the optimizer to make assumptions it can not normally make,resulting in improved compile-time and run-time speed. They change thedefault information the compiler collects.
    If used, the NO_SIDE_EFFECTS and ALLOCS_NEW_MEMORYpragmas should appear before the first function defined in a file and arein effect for the entire file. When used appropriately, these optionalpragmas provide better optimization.

    NO_SIDE_EFFECTS Pragma

    By default, the optimizer assumes that all functions might modifyglobal variables. To some degree, this assumption limits the extent ofoptimizations it can perform on global variables. The NO_SIDE_EFFECTSpragma provides a way to override this assumption. If you know for certainthat some functions do not modify global variables, you can gainfurther optimization of code containing calls to these functions by specifyingthe function names in this pragma.

    NO_SIDE_EFFECTS has the following form:

    #pragma NO_SIDE_EFFECTS functionname(1), ..., functionname(n)
    All functions in functionname are the names of functions that donot modify the values of global variables. Global variable references canbe optimized to a greater extent in the presence of calls to the listedfunctions. Note that you need the NO_SIDE_EFFECTS pragma in thefiles where the calls are made, not where the function is defined.This pragma takes effect from the line it first occurs on to the end ofthe file.

    ALLOCS_NEW_MEMORY pragma

    The ALLOCS_NEW_MEMORY pragma states that the function functionnamereturns a pointer to new memory that it either allocates or a routinethat it calls allocates. ALLOCS_NEW_MEMORY has the following form:
    #pragma ALLOCS_NEW_MEMORY functionname(1), ..., functionname(n)
    The new memory must be memory that was either newly allocated orwas previously freed and is now reallocated. For example, the standardroutines malloc() and calloc() satisfy this requirement.

    Large applications might have routines that are layered abovemalloc()andcalloc(). These interface routines make the calls to malloc()and calloc(), initialize the memory, and return the pointer thatmalloc()or calloc() returns. For example, in the program below:

    struct_type *get_new_record(void)   {   struct_type *p;    if ((p=malloc(sizeof(*p))) == NULL) {        printf("get_new_record():out of memory\n");        abort();        }   else {        /* initialize the struct */        .        .        .        return p;        }
    the routine get_new_record falls under this category, and canbe included in the ALLOCS_NEW_MEMORY pragma.

    FLOAT_TRAPS_ON pragma

    Informs the compiler that the function(s) may enable floating-point traphandling. When the compiler is so informed, it will not perform loop invariantcode motion (LICM) on floating-point operations in the function(s) namedin the pragma. This pragma is required for proper code generation whenfloating-point traps are enabled.

    #pragma FLOAT_TRAPS_ON {functionname,...functionname}#pragma FLOAT_TRAPS_ON { _ALL }

    For example:

    #pragma FLOAT_TRAPS_ON xyz,abc
    informs the compiler and optimizer that xyz and abc havefloating-point traps turned on and therefore LICM optimization should notbe performed.

    [NO]PTRS_STRONGLY_TYPED Pragma

    The PTRS_STRONGLY_TYPED pragma allows you to specify when a subsetof types are type-safe. This provides a finer lever of control than +O[no]ptrs_strongly_typed.
    #pragma PTRS_STRONGLY_TYPED END #pragma NOPTRS_STRONGLY_TYPED BEGIN #pragma NOPTRS_STRONGLY_TYPED END
    Any types that are defined between the begin-end pair are taken to applytype-safe assumptions. These pragmas are not allowed to nest. For eachBEGINan associated END must be defined in the compilation unit.

    The pragma will take precedence over the command-line option. Although,sometimes both are required (see example 2).

    Example 1

    double *d;#pragma PTRS_STRONGLY_TYPED BEGINint *i;float *f;#pragma PTRS_STRONGLY_TYPED ENDmain(){  .  .  .}
    In this example only two types, pointer-to-int and pointer-to-float willbe assumed to be type-safe.

    Example 2

    cc +Optrs_strongly_typed foo.c /*source for Ex.2 */double *d;  ...#pragma NOPTRS_STRONGLY_TYPED BEGINint *i;float *f;#pragma NOPTRS_STRONGLY_TYPED END  ...main(){  ...}
    In this example all types are assumed to be type-safe except the typesbracketed by pragma NOPTRS_STRONGLY_TYPED. The command-line optionis required because the default option is+Onoptrs_strongly_typed.

    Aliasing Options

    To be conservative, the optimizer assumes that a pointer can point to anyobject in the entire application. Instead, if the optimizer can be educatedon the application pointer usage, then the optimizer can generate moreefficient code, due to the elimination of some false assumptions. Suchbehavior can be communicated to the optimizer by using the following options:where list is a comma-separated list of global variable names.

    Here are the type-inferred aliasing rules:

    Improving Shared Library Performance

    This section describes the following pragmas to be used for improving sharedlibrary performance:The pragmas described here can improve performance of shared librariesby reducing the overhead of calling shared library routines. You must bevery careful using these pragmas because incorrect use can result in incorrectand unpredictable behavior. See also the HP-UX Linker and LibrariesUser's Guide for more information on improving shared library performance.

    HP_NO_RELOCATION Pragma

    This pragma improves performance of shared library calls by omitting floating-pointparameter relocation stubs in calls to shared library functions. Put thispragma in header files of functions that take floating point parametersor return floating point data and that will be placed in shared libraries.By putting it in the header file and ensuring all calls reference the headerfile, you ensure that it is specified at the function definition and atall calls.

    WARNING This pragma must be at the function definition and at allcall sites. If the pragma is omitted from the function definition or fromany call, the linker will generate parameter relocation code and the applicationwill behave incorrectly since floating point parameters will not be inexpected registers.

    Syntax

    #pragma HP_NO_RELOCATION
    name1[,name2[, ...]]

    where name1, name2, and so forth are names of functionsin shared libraries.

    Background

    Parameter relocation stubs are instructions that move (relocate) floatingpoint parameters and function return values between floating point registersand general registers. They are generated for calls to routines in sharedlib raries. Relocation stubs are generated when passing floating point parametersor using a floating point function return in routines in shared libraries.This pragma prevents this unnecessary relocation from being done.

    NOTE Do not use this option with functions that use the varargs macros.See the HP C/HP-UX Reference Manual or the varargs(5) manpage for information on the varargs macros.

    HP_LONG_RETURN Pragma

    This pragma improves performance of shared library calls by omitting exportstubs and using a long return instruction sequence instead. An export stubis a short code segment generated by the linker for a global definitionin a shared library. External calls to shared library functions go throughthe export stub.

    Put this pragma in header files of functions that will go in sharedlibraries so it is specified at the function definition and at all calls.For functions with floating point parameters or returns, use the HP_NO_RELOCATIONpragma along with this pragma.


    WARNING This pragma must be at the function definition and at allcall sites. If the pragma is omitted from the function definition or fromany call, the compiler will generate incompatible return code and the applicationwill behave incorrectly.

    Syntax

    #pragma HP_LONG_RETURN
    name1[,name2[, ...]]

    where name1, name2, and so forth are names of functionsin shared libraries.

    Background

    An export stub is generated by default for each function in a shared library.Each call to the function goes through the export stub. The export stubserves two purposes: to relocate parameters and perform an interspace return.

    The HP_LONG_RETURN pragma generates a long return sequence in the exportstub instead of an interspace branch. If you also use the HP_NO_RELOCATIONpragma (for functions taking floating point parameters) with the HP_LONG_RETURNpragma, all the code in the export stub is omitted, eliminating the exportstub entirely. The HP_LONG_RETURN pragma by itself eliminates the needfor export stubs for functions taking non-floating-point parameters.


    NOTE Using HP_LONG_RETURN without using HP_NO_RELOCATION with floating pointparameters, could actually degrade performance by creating export stubsand relocation stubs.

    These pragmas improve performance of calls to shared library functionsfrom outside the shared library. Therefore do not use this pragma for hiddenfunctions (see the -h and +e linker options) or for functions called onlyfrom within the same shared library linked with the -B symbolic linkeroption, otherwise this pragma may degrade performance. (See the HP-UXLinker & Libraries User's Guide for information on the above mentionedoptions.)

    Do not use this pragma if you compile on PA-RISC 2.0 or later or withthe +DA2.0 option since the effect is the default. That is, if no relocationsare generated, export stubs are not generated on PA-RISC 2.0 and later,and a long return instruction sequence is generated by default, so thispragma has no effect.


    HP_DEFINED_EXTERNAL Pragma

    This pragma improves performance of shared library calls by inlining importstubs. Place this pragma at calls to shared library routines along withthe HP_NO_RELOCATION pragma (if using floating-point parameters or returnvalues) and the HP_LONG_RETURN pragma.

    WARNING Do not use this pragma at function definitions, only at functioncalls. Specifying it at function definitions will result in incorrect behavior.

    On PA-RISC 1.1, use this pragma only when calling a shared libraryfrom an e xecutable file. Using it on calls within an executable file willcause the program to abort.


    Syntax

    #pragma HP_DEFINED_EXTERNAL
    name1[,name2[, ...]]wherename1, name2, andso forth are names of functions in shared libraries.

    Background

    Import stubs are code sequences generated at calls to shared library routines.The import stub queries the PLT (Procedure Linkage Table) to determinethe address of the shared library function & calls it. The HP_DEFINED_EXTERNALpragma inlines this import stub.

    NOTE If your function takes floating-point parameters, you should also usethe HP_NO_RELOCATION pragma (if floating point parameters are present).You should also use the HP_LONG_RETURN pragma with this pragma. If youdon't, the import stub may be too large to inline.

    Use this pragma only on calls to functions in shared libraries. On PA-RISC2.0, it will degrade performance of calls to any other functions.


    Improving Compile and Link Times

    In general, optimization increases the amount of time it takes to compileyour program, link your program, or both. However, the following optionscan help to decrease this time: