Optimizing HP aC++ Programs

HP aC++ provides options to the aCC command and pragmas to control optimization. The following sections introduce the basic concepts of optimizing your HP aC++ code for improved efficiency.

Requesting Optimization

By default, the compiler performs constant folding and simple register assignment. There are several ways to increase and control the level of optimization performed on your program:


Pragmas That Control Optimization

Compiler options provide a high-level, global approach to optimization. To give you more refinement in optimization, HP aC++ provides two pragmas: OPTIMIZE and OPT_LEVEL.

These pragmas must appear outside any function and they apply for the remainder of the file or until superseded by another pragma. For these pragmas to work, the source program must be compiled with one of the optimization options. Otherwise the pragmas are ignored.

Pragma OPTIMIZE

The OPTIMIZE pragma turns on or off optimization. It is useful for turning off optimization in sections of a source program.

Syntax of Pragma OPTIMIZE

To turn off optimization for a particular function, put #pragma OPTIMIZE OFF immediately before the function and #pragma OPTIMIZE ON immediately after the function. Then compile the function with one of the aCC command line options that enables optimization.

Example:

#pragma OPTIMIZE OFF
void g()     // Turn optimization off.
{
    ...
}
#pragma OPTIMIZE ON
void f()     // Restore optimization level.
{
    ...
}

This example, when compiled with -O, turns off optimization for function g() and restores it to level 2 for f().

Pragma OPT_LEVEL

The OPT_LEVEL pragma directs the compiler to change the current optimization level to level 1, 2, 3, or 4. It is useful for switching from one level to another within a source program.

You cannot use this pragma to raise the optimization level beyond the original level set by the option you used on the aCC command line. The compiler issues a warning if you attempt to raise the original optimization level. OPT_LEVEL 3 and 4 are only allowed at the beginning of a file.

Syntax of Pragma OPT_LEVEL

To change optimization levels for a particular function, put #pragma OPT_LEVEL n immediately before the function, where n is the level of optimization you want for the function.

Examples:

#pragma OPT_LEVEL 1
void m()
{
   ...
}
#pragma OPT_LEVEL 2
void n()
{
   ...
}

This example, when compiled with -O, lowers the optimization level to level 1 for function m() and restores it to level 2 for n().


Setting Basic Optimization Levels

HP aC++ provides four basic levels of optimization, the higher the level the more optimization performed and the longer the optimization takes.

You can specify an option on the aCC command line or in the CXXOPTS environment variable.

Example:

aCC -O prog.C

Compiles prog.C and optimizes the program at the default, level 2.

Level 1 Optimization

Level 1 optimization includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole (statement-by-statement) optimization. Use +O1 to get level 1 optimization. Level 1 is the default.

Level 1 optimization produces faster programs than without optimization and compiles faster than level 2 optimization. Programs compiled at level 1 can be used with the HP Distributed Debugging Environment (DDE) debugger. Use the debugger option -g0 or -g1.

Level 2 Optimization

Level 2 optimization includes level 1 optimizations, plus optimizations performed over entire functions in a single file. Level 2 optimizes loops in order to reduce pipeline stalls and analyzes data-flow, memory usage, loops, and expressions. Use -O or +O2 to get level 2 optimization.

Specifically, level 2 provides:

Level 2 can produce faster run-time code than level 1 if programs use loops extensively. Loop-oriented floating-point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement. Level 2 optimization produces faster programs than level 1 and compiles faster than level 3 optimization.

Level 3 Optimization

Level 3 optimization includes level 2 optimizations, plus full optimization across all subprograms within a single file. Level 3 also inlines certain subprograms within the input file. Use +O3 to get level 3 optimization.

Level 3 optimization produces faster run-time code than level 2 on code that does many procedure calls to small functions. Level 3 links faster than level 4. But level 3 does not work with the debugger options -g0 and -g1.

Level 4 Optimization

Level 4 optimization includes level 3 optimizations, plus full optimizations across the entire application program. Level 4 includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link time rather than at compile time. Use +O4 to get level 4 optimization.

Level 4 optimization produces faster run-time code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls. But level 4 does not work with the debugger options -g0 and -g1.


Additional Options for Finer Control

In addition to basic optimization levels, optimization options are provided should you require a more precise level of control. Some introductory examples follow:

Enabling Aggressive Optimizations

To enable aggressive optimizations at the second, third, or fourth optimization levels, use the +Oaggressive option as follows:

aCC +O2 +Oaggressive sourcefile.C

or:

aCC +O3 +Oaggressive sourcefile.C

or:

aCC +O4 +Oaggressive sourcefile.C

This option enables additional optimizations at each level.

CAUTION: Use aggressive optimizations with stable, well-structured code. These types of optimizations give you faster code, but may change the behavior of programs.

These optimizations may do any of the following:

Enabling Only Conservative Optimizations

You can enable only conservative optimizations at the second, third, or fourth optimization levels by using the +Oconservative option, as follows:

aCC +O2 +Oconservative sourcefile.C

or:

aCC +O3 +Oconservative sourcefile.C

or:

aCC +O4 +Oconservative sourcefile.C

This option disables all but the most conservative optimizations at each level. Conservative optimizations do not change the behavior of code, in most cases, even if the code does not conform to standards.

Use only conservative optimizations provided with level 2, 3, and 4 when your code is unstructured.

Removing Compilation Time Limits When Optimizing

You can remove optimization time restrictions at the second, third, or fourth optimization levels by using the +Onolimit option as follows:

aCC +O2 +Onolimit sourcefile.C

or:

aCC +O3 +Onolimit sourcefile.C

or:

aCC +O4 +Onolimit sourcefile.C

By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times are acceptable because you want additional optimizations to be performed.

Limiting the Size of Optimized Code

You can disable optimizations that expand code size at the second, third, and fourth optimization levels by using the +Osize suboption, as follows:

aCC +O2 +Osize sourcefile.C

or:

aCC +O3 +Osize sourcefile.C

or:

aCC +O4 +Osize sourcefile.C

Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations.

Use this option if you have limited main memory, swap space, or disk space.

Specifying Maximum Optimization

For maximum optimization, use the +Oall option as follows:

aCC +Oall sourcefile.C

This combination performs aggressive optimizations with unrestricted compile time at the highest level of optimization.

CAUTION: Use +Oall with stable, well-structured code. These types of optimizations give you the fastest code, but are riskier than the default optimizations.

The +Oall option combines the +O4, +Oaggressive, and +Onolimit options.

Combining Optimization Options

Optimization options that affect code size, (+Osize), compile-time (+Olimit), and the aggressiveness of the optimizations performed (+Oaggressive or +Oconservative) can be combined at any of the optimization levels 2 through 4.

You can use +Olimit or +Osize with either +Oaggressive or +Oconservative, but you cannot use +Oaggressive with +Oconservative.

Example:

For example, to specify conservative optimizations at level 2 and disable code-expanding optimizations, use:

aCC +O2 +Oconservative +Osize sourcefile.C


Profile-Based Optimization

Profile-based optimization (PBO) is a set of performance-improving code transformations based on the run-time characteristics of your application.

There are three steps involved in performing this optimization:

  1. Instrumentation - Use +I with any level of optimization to insert data collection code into the object program:
    aCC +I -O -c sample.C
    aCC +I -O -o sample.exe sample.o
    

  2. Data Collection - Run the program with representative data to collect execution profile statistics:
    sample.exe < input.file1
    

  3. Optimization - Use +P to generate optimized code based on the profile data:
    aCC +P -o sample.exe sample.o
    

    Compile times will be fast and link times will be slow when using PBO because code generation happens at link time.

    Notes on Using Profile-Based Optimization

    When using profile-based optimization, please note the following:

    For More Information:

    For more information on profile-based optimization, you can refer to the HP-UX Online Linker and Libraries User's Guide.

    Instrumenting the Code

    To instrument your program, use the +I option as follows:

    aCC +I -O -c sample.C              Compile for instrumentation.
    aCC +I -O -o sample.exe sample.o   Link to make instrumented executable.
    

    The first command line uses the -O option to perform level 2 optimization and the +I option to prepare the code for instrumentation. (+I generates intermediate code.) The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The .o file can be used later in the optimization phase, avoiding a second compile.

    The second command line uses the -o option to link sample.o into sample.exe. The +I option instruments sample.exe with data collection code.

    Note: Instrumented programs run slower than non-instrumented programs. Only use instrumented code to collect statistics for profile-based optimization.

    Instrumenting Code at Level 4 Optimization

    When optimizing at level 4, (where code generation is delayed until link time), use the +I option as follows:

    aCC +I +O4 -c x.C  y.C     Create intermediate file for instrumentation.
    aCC +I +O4 x.o y.o         Create optimized code with instrumentation.
    

    For More Information:

    Collecting Data for Profiling

    To collect execution profile statistics, run your instrumented program with representative data as follows:

    sample.exe < input.file1   Collect execution profile data.
    sample.exe < input.file2   Collect execution profile data.
    

    This step creates and logs the profile statistics to a file, by default called flow.data. The data collection file is a structured file that may be used to store the statistics from multiple test runs of different programs that you may have instrumented.

    Maintaining Profile Data Files

    Profile-based optimization stores execution profile data in a disk file. By default, this file is called flow.data and is located in your current working directory.

    You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.

    The FLOW_DATA environment variable can be used to specify the name of the profile data file with either the +I or +P options. The +df command line option can be used to specify the name of the profile data file when used with the +P option.

    The +df option takes precedence over the FLOW_DATA environment variable.

    Examples:

    In the following example, the FLOW_DATA environment variable is used to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.

    export FLOW_DATA=/users/profiles/prog.data
    aCC -c +I +O3 sample.C
    aCC -o sample.exe +I sample.o
    sample.exe < input.file1
    aCC -o sample.exe +P  sample.o
    

    In the next example, the +df option is used to override the flow.data file name with the name /users/profiles/prog.data.

    aCC -c +I +O3 sample.C
    aCC -o sample.exe +I sample.o
    sample.exe < input.file1
    mv flow.data /users/profile/prog.data
    aCC -o sample.exe +df /users/profiles/prog.data +P sample.o
    

    Performing Profile-Based Optimization

    To optimize the program based on the previously collected run-time profile statistics, relink the program as follows:

    aCC -o sample.exe +P sample.o
    

    When optimizing at level 4, (where code generation is delayed until link time), use the +P option as follows:

    aCC +P +O4  x.o  y.o
    

    When +P is used, no recompilation is necessary. The .o file saved from the instrumentation phase can be used as input.