OPTIMIZE and
OPT_LEVEL.
These pragmas must appear outside any function and they apply for the remainder of the file or until superseded by another pragma. For these pragmas to work, the source program must be compiled with one of the optimization options. Otherwise the pragmas are ignored.
The OPTIMIZE pragma turns on or off optimization.
It is useful for turning off optimization in sections of a source program.
To turn off optimization for a particular function, put
#pragma OPTIMIZE OFF immediately before the function and
#pragma OPTIMIZE ON immediately after the function. Then compile the
function with one of the aCC command line options that enables
optimization.
#pragma OPTIMIZE OFF
void g() // Turn optimization off.
{
...
}
#pragma OPTIMIZE ON
void f() // Restore optimization level.
{
...
}
This example, when compiled with -O, turns off
optimization for function g() and restores it to level 2 for f().
The OPT_LEVEL pragma directs the compiler to change the current
optimization level to
level 1, 2, 3, or 4.
It is useful for switching from one level to another within a source program.
You cannot use this pragma to raise the optimization level beyond the
original level set by the option you used on the aCC command line.
The compiler issues a warning if you attempt to raise the original
optimization level.
OPT_LEVEL 3 and 4 are only allowed at the beginning of a file.
To change optimization levels for a particular function, put
#pragma OPT_LEVEL n immediately before the function, where n is
the level of optimization you want for the function.
#pragma OPT_LEVEL 1
void m()
{
...
}
#pragma OPT_LEVEL 2
void n()
{
...
}
This example, when compiled with -O, lowers the
optimization level to level 1 for function m() and restores it to level 2
for n().
You can specify an option on the aCC command line or in the
CXXOPTS environment variable.
aCC -O prog.C
Compiles prog.C and optimizes the program at the
default, level 2.
+O1 to get level 1 optimization.
Level 1 is the default.
Level 1 optimization produces faster programs than without optimization and
compiles faster than level 2 optimization.
Programs compiled at level 1 can be used with the HP Distributed Debugging
Environment (DDE) debugger. Use the debugger option
-g0 or -g1.
-O or
+O2 to get level 2 optimization.
Specifically, level 2 provides:
Level 2 can produce faster run-time code than level 1 if programs use loops extensively. Loop-oriented floating-point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement. Level 2 optimization produces faster programs than level 1 and compiles faster than level 3 optimization.
+O3 to get level 3 optimization.
Level 3 optimization produces faster run-time code than level 2 on code that
does many procedure calls to small functions.
Level 3 links faster than level 4.
But level 3 does not work with the debugger options
-g0 and -g1.
+O4 to get level 4 optimization.
Level 4 optimization produces faster run-time code than level 3 if programs
use many global variables or if there are many opportunities for inlining
procedure calls. But level 4 does not work with the debugger options
-g0 and -g1.
+Oaggressive
option as follows:
aCC +O2 +Oaggressive sourcefile.C
or:
aCC +O3 +Oaggressive sourcefile.C
or:
aCC +O4 +Oaggressive sourcefile.C
This option enables additional optimizations at each level.
CAUTION: Use aggressive optimizations with stable, well-structured code. These types of optimizations give you faster code, but may change the behavior of programs.
These optimizations may do any of the following:
+Oconservative option, as follows:
aCC +O2 +Oconservative sourcefile.C
or:
aCC +O3 +Oconservative sourcefile.C
or:
aCC +O4 +Oconservative sourcefile.C
This option disables all but the most conservative optimizations at each level. Conservative optimizations do not change the behavior of code, in most cases, even if the code does not conform to standards.
Use only conservative optimizations provided with level 2, 3, and 4 when your code is unstructured.
+Onolimit
option as follows:
aCC +O2 +Onolimit sourcefile.C
or:
aCC +O3 +Onolimit sourcefile.C
or:
aCC +O4 +Onolimit sourcefile.C
By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times are acceptable because you want additional optimizations to be performed.
+Osize suboption, as follows:
aCC +O2 +Osize sourcefile.C
or:
aCC +O3 +Osize sourcefile.C
or:
aCC +O4 +Osize sourcefile.C
Most optimizations improve execution speed and decrease executable code size.
A few optimizations significantly increase code size to gain
execution speed. The +Osize option disables these code-expanding
optimizations.
Use this option if you have limited main memory, swap space, or disk space.
aCC +Oall sourcefile.C
This combination performs aggressive optimizations with unrestricted compile time at the highest level of optimization.
CAUTION:
Use +Oall with stable, well-structured code. These types of optimizations
give you the fastest code, but are riskier than the default
optimizations.
The +Oall option combines the +O4,
+Oaggressive, and
+Onolimit options.
+Osize),
compile-time (+Olimit),
and the aggressiveness of the optimizations performed
(+Oaggressive or
+Oconservative)
can be combined at any of the optimization levels 2 through 4.
You can use +Olimit or +Osize with either +Oaggressive or
+Oconservative, but you cannot use +Oaggressive with
+Oconservative.
aCC +O2 +Oconservative +Osize sourcefile.C
There are three steps involved in performing this optimization:
+I
with any level of optimization to insert data collection code into the
object program:
aCC +I -O -c sample.C aCC +I -O -o sample.exe sample.o
sample.exe < input.file1
+P to
generate optimized code based on the profile data:
aCC +P -o sample.exe sample.o
Compile times will be fast and link times will be slow when using PBO because code generation happens at link time.
When using profile-based optimization, please note the following:
+I and
+P takes more time than
linking ordinary object files. However, compile-times will be relatively
fast. This is because the compiler is only generating the intermediate code.
aCC +I -O sample.C -o sample.exe // Compile to instrumented executable. sample.exe < input.file1 // Collect execution profile data. aCC +P -O sample.C -o sample.exe // Recompile with optimization.
To instrument your program, use the +I option as
follows:
aCC +I -O -c sample.C Compile for instrumentation. aCC +I -O -o sample.exe sample.o Link to make instrumented executable.
The first command line uses the -O option to perform level 2 optimization
and the +I option to prepare the code for instrumentation. (+I
generates intermediate code.) The -c option in the first command line
suppresses linking and creates an intermediate object file called
sample.o. The .o file can be used later in the optimization phase,
avoiding a second compile.
The second command line uses the -o option to link sample.o
into sample.exe. The +I option instruments sample.exe
with data collection code.
Note: Instrumented programs run slower than non-instrumented programs. Only use instrumented code to collect statistics for profile-based optimization.
+I option as follows:
aCC +I +O4 -c x.C y.C Create intermediate file for instrumentation. aCC +I +O4 x.o y.o Create optimized code with instrumentation.
sample.exe < input.file1 Collect execution profile data. sample.exe < input.file2 Collect execution profile data.
This step creates and logs the profile statistics to a file, by default called
flow.data. The data collection file is a structured file that may be used
to store the statistics from multiple test runs of different programs that you
may have instrumented.
Profile-based optimization stores execution profile data in a disk file.
By default, this file is called flow.data and is located in
your current working directory.
You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.
The FLOW_DATA environment variable can be used to specify the name of the
profile data file with either the +I or
+P options. The +df
command line option can be used to specify the name of the profile data file
when used with the +P option.
The +df option takes precedence over the FLOW_DATA environment
variable.
FLOW_DATA environment variable is used
to override the flow.data file name.
The profile data is stored instead in /users/profiles/prog.data.
export FLOW_DATA=/users/profiles/prog.data aCC -c +I +O3 sample.C aCC -o sample.exe +I sample.o sample.exe < input.file1 aCC -o sample.exe +P sample.o
In the next example, the +df option is used to override the flow.data
file name with the name /users/profiles/prog.data.
aCC -c +I +O3 sample.C aCC -o sample.exe +I sample.o sample.exe < input.file1 mv flow.data /users/profile/prog.data aCC -o sample.exe +df /users/profiles/prog.data +P sample.o
aCC -o sample.exe +P sample.o
When optimizing at level 4, (where code generation is delayed until link time),
use the +P option as follows:
aCC +P +O4 x.o y.o
When +P is used, no recompilation is necessary. The .o file
saved from the instrumentation phase can be used as input.