HP C/HP-UX Online Help


Return to the Main HP C Online Help page




Optimizing HP C Programs

NOTE: See the Compiling & Running HP C Programs section of the HP C Online Help for a quick reference of all HP C general use compiler options and pragmas.

Summary of Major Optimization Levels
Supporting Optimization Options
Enabling Basic Optimization
Enabling Different Levels of Optimization
Changing the Aggressiveness of Optimizations
Enabling Only Conservative Optimizations
Enabling Aggressive Optimizations
Removing Compilation Time Limits When Optimizing
Limiting the Size of Optimized Code
Specifying Maximum Optimization
Combining Optimization Parameters
Summary of Optimization Parameters
Profile-Based Optimization
Controlling Specific Optimizer Features
Using Advanced Optimization Options
Level 1 Optimizations
Level 2 Optimizations
Level 3 Optimizations
Level 4 Optimizations
Guidelines for Using the Optimizer
Optimizer Assumptions
Optimizer Pragmas
Aliasing Options
Improving Shared Library Performance
Improving Compile and Link Times

The HP C optimizer transforms programs so machine resources are used more efficiently. The optimizer can dramatically improve application run-time speed. HP C performs only minimal optimizations unless you specify otherwise. You activate additional optimizations using HP C command-line options and pragmas.

There are four major levels of optimization: levels 1, 2, 3, and 4. Level 4 optimization can produce the fastest executable code. Level 4 is a superset of the other levels.

Optimization levels can be expressed as either:

+Olevel
or
-Olevel

Additional parameters enable you to control the size of the executable program, compile time, and aggressiveness of the optimizations performed.

Compile time memory and CPU usage increase with each higher level of optimization due to the increasingly complex analysis that must be performed. You can control the trade-offs between compile-time penalties and code performance by choosing the level of optimization you desire.

Generally, the optimizer is not used during code development. It is used when compiling production-level code for benchmarking and general use.

Summary of Major Optimization Levels

Table 7: HP C Major Optimization Options summarizes the major optimization options of HP C:
 

Table 7: HP C Major Optimization Options 
Option  Description  Benefits 
+O0 or -O0 Constant folding and simple register assignment. Compiles fastest.
+O1 (default) or -O1 (default) Level 0 optimizations plus instruction scheduling and optimizations that can be performed on small sections of code. Produces faster programs than level 0. Compiles faster than level 2.
+O2 or -O Level 1 optimizations, plus optimizations performed over entire functions in a single file. Optimizes loops in order to reduce pipeline stalls. Performs scalar replacement, and analysis of data-flow, memory usage, loops and expressions. Can produce faster run-time code than level 1 if programs use loops extensively. Compiles faster than level 3. Loop-oriented floating point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement.
+O3 or -O3 Level 2 optimizations, plus full optimization across all subprograms within a single file. Includes subprogram inlining. Can produce faster run-time code than level 2 on code that frequently calls small functions. Links faster than level 4.
+O4 or -O4 Level 3 optimizations, plus full optimizations across the entire application program. Includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link-time. Produces faster run-time code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls.

Supporting Optimization Options

Table 8: Other Supporting Optimizations shows optimization options that support the core optimization levels. These optimizations are performed only when specifically invoked. They are available at all optimization levels.
 

Table 8: Other Supporting Optimizations 
Option  Description  Benefits 
+ES[const|no]lit]
  • +ESconstlit stores constant-qualified (const) objects and literals in read-only memory.
  • +ESlit places string literals and constants into read-only data storage.
  • +ESnolit disables +ESconstlit, causing HP C to no longer store literals in read-only memory.
NOTE: This option is provided for compatibility purposes, and will not be supported in future releases. Use +Olit=all in place of +ESlit, and use +0lit=none in place of +ESnolit.
+I, +P Enables all profile-based optimizations. Uses execution profile data to identify the most frequently executed code paths. Repositions functions, basic blocks, and aids other optimizations according to these frequently executed paths. Improves code locality and cache hit rates. Improves efficiency of other optimizations. Benefits most applications, especially large applications with multiple compilation units. May be used at any optimization level.

Enabling Basic Optimization

To enable basic optimizations, use the -O option (equivalent to +O2/-O2), as follows:
cc
-O sourcefile.c
Basic optimizations do not change the behavior of ANSI C standard-conforming code. They improve run-time execution time but only increase compile time and link time by a moderate amount.

Enabling Different Levels of Optimization

There may be times when you want more or less optimization than what is provided with the basic -O option.

Level 1 Optimization

To enable level 1 optimization, use the +O1 option, as follows:
cc
+O1 sourcefile.c
Level 1 optimization compiles quickly, but still provides some run-time speedup.

Level 2 Optimization

To enable level 2 optimization, use the +O2 option, as follows:
cc
+O2 sourcefile.c
Level 2 (equivalent to -O) takes more time to compile, but produces greatly improved run-time speed.

Level 3 Optimization

To enable level 3 optimization, use the +O3 option, as follows:
cc
+O3 sourcefile.c
Level 3 does full optimization of all subprograms within a single file.

Level 4 Optimization

To enable level 4 optimization, use the +O4 option, as follows:
cc
+O4 sourcefile.c
Level 4 can potentially produce the greatest improvements in speed by performing optimizations across multiple object files. Level 4 does optimizations at link time, so compiles will be faster, but links will be longer.

Depending on the size and number of the modules, compiling at level 4 can consume a large amount of virtual memory. Level 4 may consume roughly 1.25 megabytes per 1000 lines of non-commented source. When you use level 4 on a large application, it is a good idea to increase the system swap space. For information on increasing system swap space, see the book Managing Systems and Workgroups.

Object Files Generated at Optimization Level 4

Object files generated by the compiler at optimization level 4, called intermediate object files (IELF), are intended to be temporary files. These object files contain an intermediate representation of the user code in a format that is designed for advanced optimizations. Hewlett-Packard reserves the right to change the format of these files without prior notice. There is no guarantee that intermediate object files will be compatible from one revision of the compiler to the next. HP C will issue an error message and terminate when an incompatible intermediate file is generated. For this reason, should always recompile if you want to optimize at +O4, to ensure compatibility and integrity of your optimized applications.

Changing the Aggressiveness of Optimizations

At each level of optimization, you can control the aggressiveness of the optimizations performed.

Use the +Olit=none +Ofltacc=strict options at optimization level 2, 3, or 4 if you are not sure if your code conforms to standards. This option provides more safety.

Use the +Ofast option at optimization level 2, 3, or 4 for best performance when you are willing to risk changes to the behavior of your programs.

Enabling Only Conservative Optimizations

You can enable conservative optimizations at the second, third, or fourth optimization levels by using the +Olit=none +Ofltacc=strict options, as follows:
cc +O2 +Olit=none +Ofltacc=strict sourcefile.c
or:
cc +O3 +Olit=none +Ofltacc=strict sourcefile.c
or:
cc +O4 +Olit=none +Ofltacc=strict sourcefile.c
Conservative optimizations are optimizations that do not change the behavior of code, in most cases, even if the code does not conform to standards.

Use the conservative optimizations provided with level 2, 3, and 4 when your code is non-ANSI.

Enabling Aggressive Optimizations

To enable aggressive optimizations at the second, third, or fourth optimization levels, use the +Ofast option as follows:
cc +O2 +Ofast
sourcefile.c
or:
cc +O3 +Ofast
sourcefile.c
or:
cc +O4 +Ofast
sourcefile.c
Aggressive optimizations are new optimizations or are optimizations that can change the behavior of programs. These optimizations may do any of the following: Use aggressive optimizations with stable, well-structured, ANSI-conforming code. These types of optimizations give you faster code, but are riskier than the default optimizations.

Removing Compilation Time Limits When Optimizing

You can remove optimization time restrictions at the second, third, or fourth optimization levels by using the +Onolimit option as follows:
cc
+O2 +Onolimit sourcefile.c
or:
cc
+O3 +Onolimit sourcefile.c
or:
cc
+O4 +Onolimit sourcefile.c
By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times and greater virtual memory use are acceptable because you want additional optimizations to be performed.

Limiting the Size of Optimized Code

You can disable optimizations that expand code size at the second, third, and fourth optimization levels by using the +Osize option, as follows:
cc +O2 +Osize sourcefile.c
or:
cc +O3 +Osize sourcefile.c
or:
cc +O4 +Osize sourcefile.c
Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations.

Use this option if you have limited main memory, swap space, or disk space.

Specifying Maximum Optimization

To get maximum optimization, use:
cc +Oall
The +Oall option performs the maximum optimization.

Use +Oall with stable, well-structured, ANSI-conforming code. These types of optimizations give you the fastest code, but are riskier than the default optimizations.

You can use +Oall at optimization levels 2, 3, and 4. The default is +Onoall.

The +Oall option by itself (specified without the +02, +03, or +04 options) implements +O4 +Ofast. This performs aggressive optimizations with unrestricted compile time at the highest level of optimization.

Combining Optimization Parameters

You can combine optimization parameters that affect code size, compile-time, and the aggressiveness of the optimizations with a level of optimization.

For example, to specify conservative optimizations at level 2 and disable code-expanding optimizations, use:

cc +O2 +Olit=none +Ofltacc=strict +Osize sourcefile.c
+Olimit and +Osize can be used with either +Ofast or +O lit=none +Ofltacc=strict.

You cannot use +Ofast +Ofltacc=relaxed with +Olit=none +Ofltacc=strict.

Summary of Optimization Parameters

Table 9: HP C Optimization Parameters summarizes the HP C optimization parameters:
 

Table 9: HP C Optimization Parameters 
Option  What It Does  Level of Opt 
+O[no]aggressive NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use the +Ofast option. 2, 3, 4
+O[no]all NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use +Ofast +O4 option in place of +O[no]all. 4
+O[no]conservative NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use the +Olit=none +Ofltacc=strict +Oconservative options in place of +O[no]conservative. 2, 3, 4
+O[no]info +Oinfo displays informational messages about the optimization process. This option supports the core optimization levels, and therefore, can be used at levels 0-4. The default is +Onoinfo. 0, 1, 2, 3, 4
+O[no]limit The +Olimit option suppresses optimizations that significantly increase compile-time or that can consume a lot of memory. The +Onolimit option allows optimizations to be performed regardless of their effect on compile-time or memory usage. The default is +Olimit. 2, 3, 4
+O[no]size The +O[no]size option enables [disables] optimizations that greatly expand code size at +O2 and above. Most optimizations improve code speed and simultaneously reduce code size. However, some optimizations may greatly increase code size. Loop unrolling is one such optimization, and is disabled when using +Osize.
This option also disables inlining, and may help reduce instruction cache misses.
2, 3, 4

Profile-Based Optimization

The following topics are described in this section: Profile-based optimization (PBO) is a set of performance-improving code transformations based on the run-time characteristics of your application.

There are three steps involved in performing this optimization:

  1. Instrumentation - Insert data collection code into the object program.
  2. Data Collection - Run the program with representative data to collect execution profile statistics.
  3. Optimization - Generate optimized code based on the profile data.

Invoke profile-based optimization through HP C by using any level of optimization and the +I and +P options on the cc command line.

When you use PBO, compile times are faster and link times are slower because code generation happens at link time.

Instrumenting the Code

To instrument your program, use the +I option as follows:
cc
-Aa +I -O -c sample.c  Compile
for instrumentation.
cc
-o sample.exe +I -O sample.o  Link
to make instrumented executable.
The first command line uses the -O option to perform level 2 optimization and instruments the code. The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The.o file can be used later in the optimization phase, avoiding a second compile.

The second command line uses the -o option to link sample.o into sample.exe. The +I option instruments sample.exe with data collection code. Note that instrumented programs run slower than non-instrumented programs. Only use instrumented code to collect statistics for profile-based optimization.

Collecting Data for Profiling

To collect execution profile statistics, run your instrumented program with representative data as follows:
sample.exe
< input.file1  Collect
execution profile data.
sample.exe
< input.file2
This step creates and logs the profile statistics to a file, by default called flow.data. You can use this data collection file to store the statistics from multiple test runs of different programs that you may have instrumented.

Performing Profile-Based Optimization

To optimize the program based on the previously collected run-time profile statistics, relink the program as follows:
cc
-o sample.exe +P -O sample.o
An alternative to this procedure is to recompile the source file in the optimization step:
cc
-o sample.exe +I -0 sample.c     instrumentation
sample.exe
< input.file1            data
collection
cc
-o sample.exe +P -O sample.c     optimization

Maintaining Profile Data Files

Profile-based optimization stores execution profile data in a disk file. By default, this file is called flow.data and is located in your current working directory.

You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.

You can use the FLOW_DATA environment variable to specify the name of the profile data file with either the +I or +P options. You can use the +df command-line option to specify the name of the profile data file with the +P option.

The +df option takes precedence over the FLOW_DATA environment variable.

In the following example, the FLOW_DATA environment variable is set to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.

% setenv FLOW_DATA /users/profiles/prog.data
% cc -Aa -c +I +O3 sample.c
% cc -o sample.exe +I +03 sample.o
% sample.exe < input.file1
% cc -o sample.exe +P +03 sample.o
In the next example, the +df option uses /users/profiles/prog.data to override the flow.data file name.
% cc -Aa -c +I +O3 sample.c
% cc -o sample.exe +I +03 sample.o
% sample.exe < input.file1
% mv flow.data /users/profile/prog.data
% cc -o sample.exe +df /users/profiles/prog.data +P +03 sample.o

Maintaining Instrumented and Optimized Program Files

You can maintain both instrumented and optimized versions of a program. You might keep an instrumented version of the program on hand for development use, and several optimized versions on hand for performance testing and program distribution.

Care must be taken when maintaining different versions of the executable file because the instrumented program file name is used as the key identifier when storing execution profile data in the data file.

The optimizer must know what this key identifier name is in order to find the execution profile data. By default, the key identifier name used to retrieve the profile data is the instrumented program file name used to run the program for data collection.

Profile-Based Optimization Notes

When using profile-based optimization, please note the following: For more information on profile-based optimization, see the HP-UX Linker and Libraries Online User Guide.

Controlling Specific Optimizer Features

Most of the time, specifying optimization level 1, 2, 3, or 4 should provide you with the control over the optimizer that you need. Additional parameters are provided when you require a finer level of control.

At each level, you can turn on and off specific optimizations using the +O[no]optimization option. The optimization parameter is the name of a specific optimization technique. The optional prefix [no] disables the specified optimization.

Below is a list of advanced optimizer options, followed by detailed information on each option:

  • +O[no]cross_region_addressing
  • +O[no]cxlimitedrange
  • +O[no]dataprefetch
  • +Odataprefetch=[none|direct|indirect]
  • +O[no]extern[=name1,name2,...nameN]
  • +Ofast
  • +Ofaster
  • +O[no]fenvaccess
  • +O[no]fltacc
  • +Ofltacc=[strict|default|limited|relaxed]
  • +O[no]frequently_called=function1[,function2]*
  • +O[no]frequently_called=filename
  • +O[no]info
  • +O[no]initcheck
  • +O[no]inline: filename
  • +O[no]inline=symbol[,symbol]*
  • +Oinline_budget=n
  • +O[no]libcalls
  • +Olibcalls=[all|default|none]
  • +O[no]libmerrno
  • +O[no]limit
  • +Olimit=[default|min|max]
  • +Olit=[all|const|none]
  • +O[no]parminit
  • +O[no]parmsoverlap
  • +O[no]procelim
  • +Oprofile=use:filename
  • +Oprofile=collect
  • +O[no]promote_indirect_calls
  • +O[no]ptrs_ansi
  • +O[no]ptrs_strongly_typed
  • +Orarely_called:filename
  • +Orarely_called=function1[,function2...]*
  • +O[no]recovery
  • +O[no]shortdata[=size]
  • +O[no]signedpointers
  • +O[no]store_ordering
  • +O[no]type_safety=[off|limited|ansi|strong]
  • +O[no]volatile
  • +Ovolatile=qualifier1[,qualifier2...]
  • +O[no]whole_program_mode

    +Olevel=name1[,name2,...nameN]

    Optimization levels: 1, 2, 3, 4

    Default: All functions are optimized at the level specified by the ordinary +Olevel option.

    This option lowers optimization to the specified levelfor one or more named functions. level can be 0, 1, 2, 3, or 4. The name parameters are names of functions in the module being compiled. Use this option when one or more functions do not optimize well or properly. It must be used with an ordinary +Olevel option.

    This option works the same as the OPT_LEVEL pragma described under Optimizer Control Pragmas . This option overrides the OPT_LEVEL pragma for the specified functions. As with the pragma, you can only lower the level of optimization; you cannot raise it above the level specified in the ordinary +Olevel option. To avoid confusion, it is best to use either this option or the OPT_LEVEL pragma rather than both.

    Examples

    The following command optimizes all functions at level 3, except for the functions myfunc1 and myfunc2, which it optimizes at level 1.

    $ cc +O3 +O1=myfunc1,myfunc2 funcs.c main.c

    The following command optimizes all functions at level 2, except for the functions myfunc1 and myfunc2, which it optimizes at level 0.

    $ cc -O +O0=myfunc1,myfunc2 funcs.c main.c

    +O[no]cxlimitedrange

    Optimization level:

    Default: +Onocxlimitedrange

    +O[no]cxlimitedrange enables [disables] the use of floating point math in the compilation unit. This is equivalent to the CX_LIMITED_RANGE pragma except that it applies to a compilation unit as opposed to a declaration or statement.

    +O[no]cross_region_addressing

    Optimization level:

    Default: +Onocross_region_addressing

    +O[no]cross_region_addressing enables [disables] the use of cross-region addressing. Cross-region addressing is required if a pointer (such as an array base) points to a different region than the data being addressed. This is usually due to an offset which results in a cross-over into another region. Standard-conforming applications do not require using cross-region addressing.

    +Odataprefetch=[none|direct|indirect]

    Default: +Odataprefetch=indirect

    When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions will be inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression).

    +Odataprefetch=direct inserts prefetches for loads and stores that have inductive addresses. The prefetches are inserted such that the latency to main memory is covered (i.e., assuming the data is not in any cache), and are given the appropriate cache hint for the data type being accessed. Prefetches are inserted for integer and floating-point loads, and floating-point stores, which are on heavily-executed paths through the loop.

    The compiler attempts to minimize the overhead of prefetching using a number of techniques, which may involve unrolling the loop.

    +Odata_prefetch and +Odata_prefetch=indirect directs the compiler to insert prefetches for indirectly-accessed data within loops.

    +O[no]dataprefetch

    Default: +Odataprefetch

    When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions will be inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression).

    The default, +Odataprefetch is the same as +Odataprefetch=indirect. It directs the compiler to insert prefetches for indirectly-accessed data within loops.

    +O[no]extern[=name1,name2,...nameN]

    NOTE: This option will not be supported in future releases of the HP C compiler. To obtain improved results, please use -Bextern in place of the +Oextern option, and -Bprotected in place of the +Onoextern option. These options are described in the HP C Compiler Options section.

    +Ofast

    Optimization level:

    Default: +O2 +Onolimit +Ofltac=relaxed +FPD +DSnative +_Oshortdata alias

    +Ofast initiates a combination of compilation options for optimum execution speed at build times. The compiler options expanded when using +Ofast are: +O2 +Olibcalls +Onolimit +Ofltacc=relaxed +FPD +DSnative +Oshortdata. It is equivalent to the -fast option.

    +Ofast is safe for most applications, but it's use may result in higher compile time or incorrect output for code that requires strict floating point accuracy.

    In addition to the optimizations performed at +O2, +Ofast:


    +Ofaster

    Optimization level: 2, 3, 4

    Default: +O2 +Onolimit +Ofltac=relaxed +FPD +DSnative +_Oshortdata alias

    +Ofaster selects the +Ofast option at optimization level +02. Must be used with +P or else the optimization level will drop to +03.

    +O[no]fenv_access

    Optimization level:

    Default: +Onofenv_access

    +O[no]fenv_access informs the compiler that a program accesses [does not access] the floating point environment to test flags or run under non-default modes. If it knows that a program does not access the floating point environment, the compiler is allowed to perform certain optimizations that it otherwise may not perform. These include global common subexpression elimination, code motion, or constant folding.

    Using +Ofenvaccess is equivalent to adding STDC FENV_ACCESS ON at the beginning of each source file to be compiled.

    +O[no]fltacc

    Optimization levels: 2, 3, 4

    The +Onofltacc option allows the compiler to perform floating-point optimizations that are algebraically correct but that may result in numerical differences. For example, this option may change the order of expression evaluation as such: If a, b, and c are floating-point variables, the expressions (a + b) + c and a + (b + c) may give slightly different results due to rounding. In general, these differences will be insignificant.

    The +Onofltacc option also enables the optimizer to generate fused multiply-add (FMA) instructions, the FMPYFADD and FMPYNFADD. These instructions improve performance but occasionally produce results that may differ from results produced by code without FMA instructions. In general, the differences are slight.

    Specifying +Ofltacc disables the generation of FMA instructions as well as some other floating-point optimizations. Use +Ofltacc if it is important that the compiler evaluate floating-point expressions as it does in unoptimized code. The +Ofltacc option does not allow any optimizations that change the order of expression evaluation and therefore may affect the result.

    If you are optimizing code at level 2 or higher and do not specify +Onofltacc or +Ofltacc, the optimizer will use FMA instructions, but will not perform floating-point optimizations that involve expression reordering or other optimizations that potentially impact numerical stability.

    The list below identifies the different actions taken by the optimizer according to whether you specify +Ofltacc, +Onofltacc, or neither option.

    Optimization        Expression       FMA?
    Options             Reordering?
     
    +02                 No               Yes
    +02 +Ofltacc        No               No
    +02 +Onofltacc      Yes              Yes

    +Ofltacc=[strict|default|limited|relaxed]

    Optimization level:

    Default: +Ofltacc=default

    +Ofltacc controls the level of floating point optimizations that the compiler may perform so that the expected accuracy of floating-point computation is not violated. The following are defined values for +Ofltacc:


    +Ofrequently_called:filename

    Optimization levels: 2, 3, 4

    Default: In the absence of dynamically obtained profile information, the frequency with which a function is called is unknown. +Ofrequently_called specifies that functions with the given filenames are to be assumed to be frequently called within the application. This is independent of +P: +Ofrequently_called overrides any dynamically obtained profile information.

    The file indicated by filename contains a list of function names, separated by spaces or newlines.

    +Ofrequently_called=function1[,function2]*

    Optimization levels: 2, 3, 4

    Default: In the absence of dynamically obtained profile information, the frequency with which a function is called is unknown. +Ofrequently_called specifies that a list of functions are to be assumed to be frequently called within the application. This is independent of +P: +Ofrequently_called overrides any dynamically obtained profile information. br>

    +O[no]info

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onoinfo

    Provide [do not provide] feedback information about the optimization process. This option is most useful at optimization levels 3 and 4.

    +O[no]initcheck

    Optimization levels: 2, 3, 4

    Default: unspecified

    The initialization checking feature of the optimizer has three possible states: on, off, or unspecified. When on (+Oinitcheck), the optimizer initializes to zero any local, scalar, non-static variables that are uninitialized with respect to at least one path leading to a use of the variable.

    When off (+Onoinitcheck), the optimizer issues warning messages when it discovers definitely uninitialized variables, but does not initialize them.

    When unspecified, the optimizer initializes to zero any local, scalar, non-static variables that are definitely uninitialized with respect to all paths leading to a use of the variable.

    Use +Oinitcheck to look for variables in a program that may not be initialized.

    +O[no]inline: filename

    Optimization levels: 3, 4

    Default: +Oinline

    When +Oinline is specified without a filename, any function can be inlined. For inlining to be successful, follow prototype definitions for function calls in the appropriate header file.

    When specified with a filename, the named functions are important candidates for inlining. For example, saying

    +Oinline=foo,bar +Onoinline
    indicates that inlining be strongly considered for foo and bar; all other routines will not be considered for inlining, since +Onoinline is given.

    When this option is disabled with a filename, the compiler will not consider the specified routines as candidates for inlining. For example, saying

    +Onoinline=baz,x
    indicates that inlining should not be considered for baz and x; all other routines will be considered for inlining, since +Oinline is the default.

    The +Onoinline disables inlining for all functions or a specific list of functions.

    Use this option when you need to precisely control which subprograms are inlined.

    +O[no]inline=symbol[,symbol]*

    Optimization levels: 3, 4

    Default: +Oinline

    When +Oinline is specified without a symbol, list any function can be inlined. For inlining to be successful, follow prototype definitions for function calls in the appropriate header file.

    When specified with a symbol list, the named functions are important candidates for inlining.

    When this option is disabled with a symbol list, the compiler will not consider the specified routines as candidates for inlining.

    The +Onoinline disables inlining for all functions or a specific list of functions.

    Use this option when you need to precisely control which subprograms are inlined.

    +Oinline_budget=n

    Optimization levels: 3, 4

    Default: +Oinline_budget=100

    where n is an integer in the range 1 - 1000000 that specifies the level of aggressiveness, as follows:

    The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option has the same effect as specifying +Oinline_budget=200. The +Osize option has the same effect as +Oinline_budget=1.

    Note, however, that the +Oinline_budget=n option takes precedence over both of these options. This means that you can override the effect of +Onolimit or +Osize option on inlining by specifying the +Oinline_budget=n option on the same compile line.

    +O[no]libcalls

    Optimization levels: 0, 1, 2, 3, 4

    Default: +O[no]libcalls

    Use the +Olibcalls option to increase the runtime performance of code which calls standard library routines in simple contexts.

    The +Olibcalls option expands the following library calls inline:

    Inlining will take place only if the function call follows the prototype definition the appropriate header file. Fast subprogram linkage is also emitted to tuned millicode versions of the math library functions sin, cos, tan, atan 2, log, pow,asin, acos, atan, exp, and log10. (See the HP-UX Floating-Point Guide for the most up-to-date listing of the math library functions.) The calling code must not expect to access ERRNO after the function"s return.

    A single call to printf() may be replaced by a series of calls to putchar(). Calls to sprintf() and strlen() may be optimized more effectively, including elimination of some calls producing unused results. Calls to setjmp() and longjmp() may be replaced by their equivalents _setjmp() and _longjmp(), which do not manipulate the process"s signal mask.

    Use +Olibcalls to improve the performance of selected library routines only when you are not performing error checking for these routines.

    Using +Olibcalls with +Ofltacc will give different floating point calculation results than those given using +Ofltacc without +Olibcalls.

    The +Olibcalls option replaces the obsolete -J option.

    +Olibcalls=[all|default|none]

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Olibcalls=default

    Control the use of low-call-overhead versions of select library routines. No error checking is done; that is, errno(2) is not set. This optimization can occur at optimization levels 0, 1, 2, 3, and 4. The defined values for level are:


    +O[no]libmerrno

    Optimization levels:

    Default: +Onolibmerrno

    Enable [disable] support for errno in libm functions.

    +O[no]limit

    Optimization levels: 2, 3, 4

    Default: +Olimit

    The +O[no]limit option controls the amount of compile-time spent performing optimization. By default, the compiler focuses on optimizing large programs at +O2 and above; this is done to avoid non-linear compile times.

    You can remove optimization time restrictions at +O2 and above by using the +Onolimit option. This lets you perform full optimization of large procedures, but may incur significant compile time increases for very large procedures.

    If longer compile times are acceptable, +Onolimit can result in significant performance improvements.

    To completely avoid non-linear compile times, you can limit the amount of time spent optimizing code by using +Olimit.

    +Olimit=[default|min|max]

    Optimization levels: 2, 3, 4

    Default: +Olimit=default

    The +Olimit=[default|min|max] option controls the amount of compile-time spent performing optimization. The defined values are:

    The +O[no]limit option controls the amount of compile-time spent performing optimization. By default, the compiler focuses on optimizing large programs at +O2 and above; this is done to avoid non-linear compile times.

    You can remove optimization time restrictions at +O2 and above by using the +Olimit=none option. This lets you perform full optimization of large procedures, but may incur significant compile time increases for very large procedures.

    To completely avoid non-linear compile times, you can limit the amount of time spent optimizing code by using +Olimit=min.

    +Olit=[all|const|none]

    Optimization levels:

    Default: +Olit=all

    Controls which data items are placed in the read-only data section. The defined values for level are:


    +O[no]parminit

    Optimization levels: 1, 2, 3, 4

    Default: +Oparminit

    Enable [disable] automatic initialization of unspecified function parameters at call sites to zero. This is useful for preventing NaT values in parameter registers.

    +O[no]parmsoverlap

    Optimization levels: 1, 2, 3, 4

    Default: +Oparmsoverlap

    Optimize with the assumption that the actual arguments of function calls overlap in memory. Use +Onoparmsoverlap if C programs have been literally translated from HP Fortran programs.

    +O[no]procelim

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onoprocelim at levels 0-3, +Oprocelim at level 4

    When +Oprocelim is specified, procedures that are not referenced by the application are eliminated from the output executable file. The +Oprocelim option reduces the size of the executable file, especially when optimizing at levels 3 and 4, at which inlining may have removed all of the calls to some routines.

    When you specify +Onoprocelim, procedures that are not referenced by the application are not eliminated from the output executable file.

    The default is +Onoprocelim at levels 0-3, and +Oprocelim at level 4.

    If the +Oall option is enabled, the +Oprocelim option is enabled.

    +Oprofile=use:filename

    Optimization levels: 0, 1, 2, 3, 4

    Default: Name of PBO file

    Specify the filename as the name of the previously collected profile database (PBO) file.

    +Oprofile=collect

    Optimization levels: 0, 1, 2, 3, 4

    Default:

    Instrument the application for profile-based optimization. This is the same as the +I option.

    +O[no]promote_indirect_calls

    Optimization levels: 3, 4 and profile-based optimization

    Default: +Onopromote_indirect_calls

    This option uses profile data from profile-based optimization and other information to determine the most likely target of indirect calls and promotes them to direct calls. In all cases the optimized code tests to make sure the direct call is being taken & if not, executes the indirect call. If +Oinline is in effect, the optimizer may also inline the promoted calls. This option can only be used with profile-based optimization, described in Profile-Based Optimization .

    The optimizer tries to determine the most likely target of indirect calls. If the profile data is incomplete or ambiguous, the optimizer may not select the best target. If this happens, your code's performance may decrease.

    At +O3, this option is only effective if indirect calls from functions within a file are mostly to target functions within the same file. This is because +O3 optimizes only within a file whereas +O4 optimizes across files.

    +O[no]ptrs_ansi

    Optimization levels: 2, 3, 4

    Default: +Onoptrs_ansi

    Use +Optrs_ansi to make the following two assumptions, which the more aggressive +Optrs_strongly_typed does not make:

    When both are specified, +Optrs_ansi takes precedence over +Optrs_strongly_typed.
  • For more information about type aliasing see Aliasing Options .

    +O[no]ptrs_to_globals[=name1,name2,...,nameN]

    Optimization levels: 2, 3, 4

    Default: +Onoptrs_ansi

    Tell the optimizer whether global variables are modified [are not modified] through pointers.

    +O[no]ptrs_strongly_typed

    Optimization levels: 2, 3, 4

    Default: +Onoptrs_strongly_typed

    Use +Optrs_strongly_typed when pointers are type-safe. The optimizer can use this information to generate more efficient code.

    Type-safe (that is, strongly-typed) pointers are pointers to a specific type that only point to objects of that type, and not to objects of any other type. For example, a pointer declared as a pointer to an int is considered type-safe if that pointer points to an object only of type int, but not to objects of any other type.

    Based on the type-safe concept, a set of groups are built based on object types. A given group includes all the objects of the same type.

    The term type-inferred aliasing is a concept which means any pointer of a type in a given group (of objects of the same type) can only point to any object from the same group; it can not point to a typed object from any other group.

    For more information about type aliasing see Aliasing Options .

    Type casting to a different type violates type-inferring aliasing rules. See Example 2 below.

    Dynamic casting is allowed. See Example 3 below.

    For more details, see Aliasing Options .

    Example 1: How Data Types Interact

    The optimizer generally spills all global data from registers to memory before any modification to global variables or any loads through pointers. However, you can instruct the optimizer on how data types interact so it can generate more efficient code.

    If you have the following:

    1  int *p;
    2  float *q;
    3  int a,b,c;
    4  float d,e,f;
    5  foo()
    6  {
    7    for (i=1;i<10;i++) {
    8              d=e
    9             *p=b;
    10             e=d+f;
    11             f=*q;
    12   }
    13 }
    With +Onoptrs_strongly_typed turned on, the pointers p and q will be assumed to be disjoint because the types they point to are different types. Without type-inferred aliasing, *p is assumed to invalidate all the definitions. So, the use of d and f on line 10 have to be loaded from memory. With type-inferred aliasing, the optimizer can propagate the copy of d and f and thus avoid two loads and two stores.

    This option can be used for any application involving the use of pointers, where those pointers are type safe. To specify when a subset of types are type-safe, use the [NO]PTRS_STRONGLY_TYPED pragma. The compiler issues warnings for any incompatible pointer assignments that may violate the type-inferred aliasing rules discussed in Aliasing Options .

    Example 2: Unsafe Type Cast

    Any type cast to a different type violates type-inferred aliasing rules. Do not use +Optrs_strongly_typed with code that has these unsafe type casts. Use the [NO]PTRS_STRONGLY_TYPED pragma to prevent the application of type-inferred aliasing to the unsafe type casts.

    struct foo{
      int a;
      int b;
    } *P;
     
    struct bar {
      float a;
      int b;
      float c;
    } *q;
     
    P = (struct foo *) q;
      /* Incompatible pointer assignment
      through type cast */
    Example 3: Generally Applying Type Aliasing

    Dynamic cast is allowed with +Optrs_strongly_typed or +Optrs_ansi. A pointer dereference is called dynamic cast if a cast is applied on the pointer to a different type.

    In the example below, type-inferred aliasing is applied on P generally, not just to the particular dereference. Type-aliasing will be applied to any other dereferences of P.

    struct s {
      short int a;
      short int b;
      int c;
    } *P;
    * (int *)P = 0;
    For more information about type aliasing, see Aliasing Options .

    +O[no]rarely_called:filename

    Optimization levels: 2, 3, 4

    Default: In the absence of dynamically obtained profile information, the frequency with which that function is called is unknown. +Orarely_called specifies that functions with the given filenames are to be assumed to be rarely called within the application. This is independent of +P: +Orarely_called overrides any dynamically obtained profile information.

    The file indicated by filename contains a list of function names, separated by spaces or newlines.

    +O[no]rarely_called=function1[,function2]*

    Optimization levels: 2, 3, 4

    Default: In the absence of dynamically obtained profile information, the frequency with which that function is called is unknown. +Orarely_called specifies that a function list is assumed to be rarely called within the application. This is independent of +P: +Orarely_called overrides any dynamically obtained profile information.

    +O[no]recovery

    Optimization levels: 2, 3, 4

    Default: +Onorecovery

    +O[no]recovery generates [does not generate] recovery code for control speculation. This option specifies whether recovery code will be generated for control speculation. When this option is enabled, each control speculative load will have a matching control speculative check instruction, inserted by the compiler at the original position of the load. A block of recovery code is then inserted at the label specified by the check instruction.

    If +Onorecovery is specified, no control speculative checks or recovery blocks are inserted. Instead, the operating system handles the recovery. The advantage of using +Onorecovery is that code size is smaller, both along the critical path, due to the lack of check instructions, and overall, due to the lack of recovery blocks. NOTE: For code that writes to uncachable memory that may not be properly identified as volatile, the +Orecovery option reduces the risk of incorrect behavior. However, such code may produce unpredictable results.

    +Oreusedir=directory

    Optimization levels: 4 or with profile-based optimization

    Default: no reuse of object files

    This option specifies a directory where the linker can save object files created from intermediate object files when using +O4 or profile-based optimization. It reduces link time by not recompiling intermediate object files when they don't need to be.

    When you compile with +I, +P, or +O4, the compiler generates intermediate code in the object file. Otherwise, the compiler generates regular object code in the object file. When you link, the linker first compiles the intermediate object code to regular object code, then links the object code. With this option you can reduce link time on subsequent links by avoiding recompiling intermediate object files that have already been compiled to regular object code and have not changed.

    Note that when you do change a source file or command line options and recompile, a new intermediate object file will be created and compiled to regular object code in the specified directory. The previous object file in the directory will not be removed. You should periodically remove this directory since old object files cannot be reused and will not be automatically removed.

    +Oshortdata

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Oshortdata=8 (specifies a size of eight bytes in the short data area)

    Controls the size of objects placed in the short data area. All objects of size n=bytes will be placed in the short data area. Valid values of n are 0, or a decimal number between 8 and 4,191,304 bytes (4MB).


    +O[no]signedpointers

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onosignedpointers

    Perform [or do not perform] optimizations related to treating pointers as signed quantities. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled.

    Use +Osignedpointers to improve application run-time speed.

    +O[no]store_ordering

    Optimization levels: 1, 2, 3, 4

    Default: +Onostore_ordering

    +O[no]store_ordering preserves (does not preserve) the original program order for stores to memory that may not be visable to multiple threads. This does not, however, imply strong ordering.

    +O[no]type_safety=[off|limited|ansi|strong]

    Optimization levels: 1, 2, 3, 4

    Default: +Otype_safety=off, +Onotype_safetyM

    Enable [disable] aliasing across types. The following are +O[no]type_safety values:


    +O[no]volatile=qualifier1[,qualifier2...]

    Optimization levels: 0, 1, 2, 3, 4

    Default: +Onovolatile

    The +Ovolatile=qualifier1[,qualifier2...] has the effect of applying the specified qualifiers to all uses of volatile in the source. If used in conjunction with +Ovolatile, the qualifiers also apply to the implicit volatile declarations of global variables. The defined values for qualifer include the following:


    +O[no]whole_program_mode

    Optimization level: 4

    Default: +Onowhole_program_mode

    The +Owhole_program_mode option enables the assertion that only the files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals.

    When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures.

    All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4 but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program.

    The default, +Onowhole_program_mode, disables the assertion.

    Use this option to increase performance speed, but only when you are certain that only the files compiled with +Owhole_program_mode directly access any globals that are defined in these files.

    Level 1 Optimization Modules

    The level 1 optimization modules are: The examples in this section are shown at the source code level wherever possible. Transformations that cannot be shown at the source level are shown in assembly language. See Table 10: Descriptions of Assembly Language Instructions for descriptions of the assembly language instructions used.

    Branch Optimization

    The branch optimization module traverses the procedure and transforms branch instruction sequences into more efficient sequences where possible. Examples of possible transformations are:
              if(a) {
                .
                .
                .
                 statement 1
              } else {
                 goto L1;
              }
              statement 2
          L1:
    becomes:
              if(!a) {
                  goto L1;
              }
              statement 1
              statement 2
          L1:

    Dead Code Elimination

    The dead code elimination module removes unreachable code that is never executed.

    For example, the code:

         if(0) {
            a = 1;
         } else {
            a = 2;
    becomes:
         a = 2;

    Faster Register Allocation

    The faster register allocation module, used with unoptimized code, analyzes register use faster than the coloring register allocator (a level 2 module).

    This module performs the following:

    Instruction Scheduler

    The instruction scheduler module performs the following: For example, the code:
         LDW     -52(0,30),r1
         ADDI    3,r1,r31    ;interlock with load of r1
         LDI     10,r19
    becomes:
         LDW     -52(0,sp),r1
         LDI     10,r19
         ADDI    3,r1,r31    ;use of r1 is now separated from load

    Table 10: Descriptions of Assembly Language Instructions
    Instruction  Description 
    LDWoffset(sr, base), target Loads a word from memory into register target.
    ADDIconst, reg, target Adds the constant const to the contents of register reg and puts the result in register target.
    LDIconst, target Loads the constant const into register target.
    LDOconst(reg),target Adds the constant const to the contents of register reg and puts the result in register target.
    ANDreg1, reg2, target Performs a bitwise AND of the contents of registers reg1 and reg2 and puts the result in register target.
    COMIBcond const, reg, lab Compares the constant const to the contents of register reg and branches to label lab if the condition cond is true.
    BBcond reg,num,lab Tests the bit number num in the contents of register reg and branches to label lab if the condition cond is true.
    COPYreg, target Copies the contents of register reg to register target.
    STWreg, offset(sr, base) Store the word in register reg to memory.

    Peephole Optimizations

    The peephole optimization process involves looking at small windows of machine code for optimization opportunities. Wherever possible, the peephole optimizer replaces assembly language instruction sequences with faster (usually shorter) sequences, and removes redundant register loads and stores.

    For example, the code:

         LDI     32,r3
         AND     r1,r3,r2
         COMIB,= 0,r2,L1
    becomes:
         BB,>=   r1, 26, L1

    Level 2 Optimization Modules

    Level 2 performs optimizations within each procedure. At level 2, the optimizer performs all optimizations performed at the prior level, with the following additions: The examples in this section are shown at the source code level wherever possible. Transformations that cannot be shown at the source level are shown in assembly language.

    Coloring Register Allocation

    The name of this optimization comes from the similarity to map coloring algorithms in graph theory. This optimization determines when and how long commonly used variables and expressions occupy a register. It minimizes the number of references to memory (loads and stores) a code segment makes. This can improve run-time speed.

    You can help the optimizer understand when certain variables are heavily used within a function by declaring these variables with the register qualifier. The first 10 register qualified variables encountered in the source are honored. You should pick the ten most important variables to be most effective.

    The coloring register allocator may override your choices and promote to a register a variable not declared register over one that is, based on estimated speed improvements.

    The following code shows the type of optimization the coloring register allocation module performs. The code:

         LDI     2,r104
         COPY    r104,r103
         LDO     5(r103),r106
         COPY    r106,r105
         LDO     10(r105),r107
    becomes:
         LDI     2,r25
         LDO     5(r25),r26
         LDO     10(r26),r31

    Induction Variables and Strength Reduction

    The induction variables and strength reduction module removes expressions that are linear functions of a loop counter and replaces each of them with a variable that contains the value of the function. Variables of the same linear function are computed only once. This module also simplifies the function by replacing multiplication instructions with addition instructions wherever possible.

    For example, the code:

         for (i=0; i<25; i++) {
              r[i] = i * k;
         }
    becomes:
         t1 = 0;
         for (i=0; i<25; i++) {
              r[i] = t1;
              t1 += k;
         }

    Local and Global Common Subexpression Elimination

    The common subexpression elimination module identifies expressions that appear more than once and have the same result, computes the result, and substitutes the result for each occurrence of the expression. The types of subexpression include instructions that load values from memory, as well as arithmetic evaluation.

    For example, the code:

         a = x + y + z;
         b = x + y + w;
    becomes:
         t1 = x + y;
         a = t1 + z;
         b = t1 + w;

    Constant Folding and Propagation

    Constant folding computes the value of a constant expression at compile time. For example:
    A = 10;
    B = A + 5;
    C = 4 * B;
    can be replaced by:
    A = 10;
    B = 15;
    C = 60;

    Loop Invariant Code Motion

    The loop invariant code motion module recognizes instructions inside a loop whose results do not change and moves them outside the loop. This ensures that the invariant code is only executed once.

    For example, the code:

         x = z;
         for(i=0; i<10; i++)
         {
              a[i] = 4 * x + i;
         }
    becomes:
         x = z;
         t1 = 4 * x;
         for(i=0; i<10; i++)
         {
              a[i] = t1 + i;
         }

    Store/Copy Optimization

    Where possible, the store/copy optimization module substitutes registers for memory locations, by replacing store instructions with copy instructions and deleting load instructions.

    For example, the following HP C code:

         a = x + 23;
    where a is a local variable.
         return a;
    produces the following code for the unoptimized case:
         LDO     23(r26),r1
         STW     r1,-52(0,sp)
         LDW     -52(0,sp),ret0
    and this code for the optimized case:
         LDO     23(r26),ret0

    Unused Definition Elimination

    The unused definition elimination module removes unused memory location and register definitions. These definitions are often a result of transformations made by other optimization modules.

    For example, the function:

         f(int x)
         {
              int a,b,c:
     
              a = 1;
              b = 2;
              c = x * b;
              return c;
         }
    becomes:
         f(int x)
         {
              int a,b,c;
     
              b = 2;
              c = x * b;
              return c;
         }

    Register Reassociation

    Array references often require one or more instructions to compute the virtual memory address of the array element specified by the subscript expression. The register reassociation optimization implemented in the PA-RISC compilers tries to reduce the cost of computing the virtual memory address expression for array references found in loops.

    Within loops, the virtual memory address expression can be rearranged and separated into a loop varying term and a loop invariant term. Loop varying terms are those items whose values may change from one iteration of the loop to another. Loop invariant terms are those items whose values are constant throughout all iterations of the loop. The loop varying term corresponds to the difference in the virtual memory address associated with a particular array reference from one iteration of the loop to the next.

    The register reassociation optimization dedicates a register to track the value of the virtual memory address expression for one or more array references in a loop and updates the register appropriately in each iteration of a loop.

    The register is initialized outside the loop to the loop invariant portion of the virtual memory address expression and the register is incremented or decremented within the loop by the loop variant portion of the virtual memory address expression. On PA-RISC, the update of such a dedicated register can often be performed for free using the base-register modification capability of load and store instructions.

    The net result is that array references in loops are converted into equivalent but more efficient pointer dereferences.

    For example:

    int a[10][20][30];
     
    void example (void)
    {
      int i, j, k;
     
      for (k = 0; k < 10; k++)
        for (j = 0; j < 10; j++)
          for (i = 0; i < 10; i++)
          {
              a[i][j][k] = 1;
          }
    }
    after register reassociation is applied to the innermost loop becomes:
    int a[10][20][30];
     
    void example (void)
    {
      int i, j, k;
      register int (*p)[20][30];
     
      for (k = 0; k < 10; k++)
        for (j = 0; j < 10; j++)
          for (p = (int (*)[20][30]) a[0][j][k], i = 0; i < 10; i++)
          {
              *(p++[0][0]) = 1;
          }
    }
    In the above example, the compiler-generated temporary register variable, p, strides through the array a in the innermost loop. This register pointer variable is initialized outside the innermost loop and auto-incremented within the innermost loop as a side-effect of the pointer dereference.

    Register reassociation can often enable another loop optimization. After performing the register reassociation optimization, the loop variable may be needed only to control the iteration count of the loop. If this is case, the original loop variable can be eliminated altogether by using the PA-RISC ADDIB and ADDB machine instructions to control the loop iteration count.

    Level 3 Optimizations

    Level 3 optimization includes level 2 optimizations, plus full optimization across all subprograms within a single file. Level 3 also inlines certain subprograms within the input file. Use +O3 to get level 3 optimization.

    Level 3 optimization produces faster run-time code than level 2 on code that frequently calls small functions within a file. Level 3 links faster than level 4.

    Inlining within a Single Source File

    Inlining substitutes functions calls with copies of the function"s object code. Only functions that meet the optimizer"s criteria are inlined. This may result in slightly larger executable files. However, this increase in size is offset by the elimination of time-consuming procedure calls and procedure returns.

    Example of Inlining

    The following is an example of inlining at the source code level. Before inlining, the source file looks like this:
    /* Return the greatest common divisor of two positive integers,  */
    /* int1 and int2, computed using Euclid"s algorithm.  (Return 0  */
    /* if either is not positive.)                                   */
    int gcd(int1,int2)
      int int1;
      int int2;
    {
      int inttemp;
     
        if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {
            return(0);
        }
        do {
            if ( int1 < int2 ) {
                inttemp = int1;
                int1    = int2;
                int2    = inttemp;
            }
            int1 = int1 - int2;
        } while (int1 > 0);
        return(int2);
    }
     
    main()
    {
      int xval,yval,gcdxy;
        /* statements before call to gcd */
        gcdxy = gcd(xval,yval);
        /* statements after call to gcd */
    }
    After inlining, the source file looks like this:
    main()
    {
      int xval,yval,gcdxy;
        /* statements before inlined version of gcd */
        {
          int int1;
          int int2;
     
            int1 = xval;
            int2 = yval;
            {
              int inttemp;
     
                if ( ( int1 <= 0 ) || ( int2 <= 0 ) ) {
                    gcdxy = ( 0 );
                    goto AA003;
                }
                do {
                    if ( int1 < int2 ) {
                        inttemp = int1;
                        int1    = int2;
                        int2    = inttemp;
                    }
                    int1 = int1 - int2;
                } while ( int1 > 0 );
                gcdxy = ( int2 );
            }
        }
    AA003 : ;
        /* statements after inlined version of gcd */
    }

    Level 4 Optimizations

    Level 4 performs optimizations across all files in a program. At level 4, all optimizations of the prior levels are performed. Two additional optimizations are performed: Interprocedural global optimizations across all files within a program searches across function boundaries to produce better and faster code sequences. Normally, global optimizations are performed within individual functions or source code files. Interprocedural optimizations look at function interactions within a program and transform particular code sequences into faster code. Since information about every function within a program is required, this level of optimization must be performed at link time.

    Inlining Across Multiple Files

    Inlining at Level 4 is performed across all procedures within the program. Inlining at level 3 is done within one file.

    Inlining substitutes function calls with copies of the function"s object code. Only functions that meet the optimizer"s criteria are inlined. This may result in slightly larger executable files. However, this increase in size is offset by the elimination of time-consuming procedure calls and procedure returns.

    Global and Static Variable Optimization

    Global and static variable optimizations look for ways to reduce the number of instructions required for accessing global and static variables. The compiler normally generates two machine instructions when referencing global variables. Depending on the locality of the global variables, single machine instructions may sometimes be used to access these variables. The linker rearranges the storage location of global and static data to increase the number of variables that can be referenced by single instructions.

    Global Variable Optimization Coding Standards

    Since this optimization rearranges the location and data alignment of global variables, avoid the following programming practices:

    Guidelines for Using the Optimizer

    The following guidelines assist in effectively using and and writing efficient HP C programs.

    Optimizer Assumptions

    During optimization, the compiler gathers information about the use of variables and passes this information to the optimizer. The optimizer uses this information to ensure that every code transformation maintains the correctness of the program, at least to the extent that the original unoptimized program is correct.

    When gathering this information, the HP C compiler makes the following assumption: while inside a function, the only variables that can be accessed indirectly through a pointer or by another function call are:

    Optimizer Pragmas

    Pragmas give you the ability to: Pragmas cannot cross line boundaries and the word pragma must be in lowercase letters. Optimizer pragmas may not appear inside a function.

    Optimizer Control Pragmas

    The OPTIMIZE and OPT_LEVEL pragmas control which functions are optimized, and which set of optimizations are performed. You can place these pragmas before any function definitions and they override any previous pragma. These pragmas cannot raise the optimization level above the level specified in the command line.

    OPT_LEVEL 0, 1, and 2 provide more control over optimization than the +O1 and +O2 compiler options. You use these pragmas to raise or lower optimization at a function level inside the source file. Whereas, the compiler options can only be used for an entire source file. (OPT_LEVEL 3 and 4 can only be used at the beginning of the source file.)

    Table 11: Optimization Level Precedence shows the possible combinations of options and pragmas and the resulting optimization levels. The level at which a function will be optimized is the lower of the two values specified by the command line optimization level and the optimization pragma in force.
     

    Table 11: Optimization Level Precedence 
    Command-line Optimization Level  #Pragma OPT_LEVEL  Resulting OPT_LEVEL 
    none OFF 0
    none 1 0
    none 2 0
    +O1 OFF 0
    +O1 1 1
    +O1 2 1
    +O1 3 1
    +O1 4 1
    +O2 OFF 0
    +O2 1 1
    +O2 2 2
    +O2 3 2
    +O2 4 2
    +O3 OFF 0
    +O3 1 1
    +03 2 2
    +03 3 3
    +03 4 3
    +04 OFF 0
    +04 1 1
    +04 2 2
    +04 3 3
    +O4 4 4

    The values of OPTIMIZE and OPT_LEVEL are summarized in Table 12: Optimizer Control Pragmas
     

    Table 12: Optimizer Control Pragmas 
    Pragma  Description 
    #pragma OPTIMIZE ON Turns optimization on.
    #pragma OPTIMIZE OFF Turns optimization off.
    #pragma OPT_LEVEL 1 Optimize only within small blocks of code
    #pragma OPT_LEVEL 2 Optimize within each procedure.
    #pragma OPT_LEVEL 3 Optimize across all procedures within a source file.
    #pragma OPT_LEVEL 4 Optimize across all procedures within a program.

    Inlining Pragmas

    When INLINE is specified without a functionname, any function can be inlined. When specified with functionname(s), these functions are candidates for inlining.

    The NOINLINE pragma disables inlining for all functions or specified functionname(s).

    The syntax for performing inlining is:

    #pragma INLINE [functionname(1), ..., functionname(n)]
    
    #pragma NOINLINE [functionname(1), ..., functionname(n)]
    For example, to specify inlining of the two subprograms checkstat and getinput, use:
    #pragma INLINE checkstat, getinput
    To specify that an infrequently called routine should not be inlined when compiling at optimization level 3 or 4, use:
    #pragma NOINLINE opendb
    See also the related +O[no]inline optimization option.

    Alias Pragmas

  • FLOAT_TRAPS_ON pragma
  • [NO]PTRS_STRONGLY_TYPED Pragma
  • The compiler gathers information about each function (such as information about function calls, variables, parameters, and return values) and passes this information to the optimizer.

    FLOAT_TRAPS_ON pragma

    Informs the compiler that the function(s) may enable floating-point trap handling. When the compiler is so informed, it will not perform loop invariant code motion (LICM) on floating-point operations in the function(s) named in the pragma. This pragma is required for proper code generation when floating-point traps are enabled.

    #pragma FLOAT_TRAPS_ON { functionname,...functionname } #pragma FLOAT_TRAPS_ON { _ALL }

    For example:

    #pragma FLOAT_TRAPS_ON xyz,abc
    informs the compiler and optimizer that xyz and abc have floating-point traps turned on and therefore LICM optimization should not be performed.
    -->

    Improving Shared Library Performance

    This section describes the HP_DEFINED_EXTERNAL pragma, which is used to improve shared library performance.

    HP_DEFINED_EXTERNAL can improve performance of shared libraries by reducing the overhead of calling shared library routines. You must be very careful using this pragma because incorrect use can result in incorrect and unpredictable behavior. See also the HP-UX Linker and Libraries User's Guide for more information on improving shared library performance.


    HP_DEFINED_EXTERNAL Pragma

    This pragma improves performance of shared library calls by inlining import stubs. Place this pragma at calls to shared library routines.

    WARNING  Do not use this pragma at function definitions, only at function calls. Specifying it at function definitions will result in incorrect behavior.

    Syntax

    #pragma HP_DEFINED_EXTERNAL
    name1[, name2[, ...]]where name1, name2, and so forth are names of functions in shared libraries.

    Background

    Import stubs are code sequences generated at calls to shared library routines. The import stub queries the PLT (Procedure Linkage Table) to determine the address of the shared library function & calls it. The HP_DEFINED_EXTERNAL pragma inlines this import stub.

    NOTE:Use this pragma only on calls to functions in shared libraries.


    Improving Compile and Link Times

    In general, optimization increases the amount of time it takes to compile your program, link your program, or both. +objdebug shortens compile time by not copying debugging information from the object files into the executable file.