Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Linker and Libraries User's Guide: HP 9000 Computers > Chapter 8 Ways to Improve Performance

Profile-Based Optimization

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

In profile-based optimization (PBO), the compiler and linker work together to optimize an application based on profile data obtained from running the application on a typical input data set. For instance, if certain procedures call each other frequently, the linker can place them close together in the a.out file, resulting in fewer instruction cache misses, TLB misses, and memory page faults when the program runs. Similar optimizations can be done at the basic block levels of a procedure. Profile data is also used by the compiler for other general tasks, such as code scheduling and register allocation.

When to Use PBO

PBO should be the last level of optimization you use when building an application. As with other optimizations, it should be performed after an application has been completely debugged.

Most applications will benefit from PBO. The two types of applications that will benefit the most from PBO are:

  • Applications that exhibit poor instruction memory locality. These are usually large applications in which the most common paths of execution are spread across multiple compilation units. The loops in these applications typically contain large numbers of statements, procedure calls, or both.

  • Applications that are branch-intensive. The operations performed in such applications are highly dependent on the input data. User interface managers, database managers, editors, and compilers are examples of such applications.

Of course, the best way to determine whether PBO will improve an application's performance is to try it.

NOTE: Under some conditions, PBO is incompatible with programs that explicitly load shared libraries. Specifically, PBO will not function properly if the shl_load routine has either the BIND_FIRST or the BIND_NOSTART flags set. For more information about explicit loading of shared libraries, see “The shl_load and cxxshl_load Routines ”.

How to Use PBO

Profile-based optimization involves these steps:

  1. Instrument the application — prepare the application so that it will generate profile data.

  2. Profile the application — create profile data that can be used to optimize the application.

  3. Optimize the application — generate optimized code based on the profile data.

A Simple Example

Suppose you want to apply PBO to an application called sample. The application is built from a C source file sample.c. Discussed below are the steps involved in optimizing the application.

Step 1 Instrumentation

First, compile the application for instrumentation and level 2 optimization:

$ cc -v -c +I -O sample.c
/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
$ cc -v -o sample.inst +I -O sample.o
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main \
-o sample.inst sample.o -I -lc

At this point, you have an instrumented program called sample.inst.

Step 2 Profile

Assume you have two representative input files to use for profiling, input.file1 and input.file2. Now execute the following three commands:

$ sample.inst < input.file1
$ sample.inst < input.file2
$ mv flow.data sample.data

The first invocation of sample.inst creates the flow.data file and places an entry for that executable file in the data file. The second invocation increments the counters for sample.inst in the flow.data file. The third command moves the flow.data file to a file named sample.data.

Step 3 Optimize

To perform profile based optimizations on this application, relink the program as follows:

$ cc -v -o sample.opt +P +pgm sample.inst \
+df sample.data sample.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o sample.opt \
+pgm sample.inst +df sample.data sample.o -P -lc

Note that it was not necessary to recompile the source file. The +pgm option was used because the executable name used during instrumentation, sample.inst, does not match the current output file name, sample.opt. The +df option is necessary because the profile database file for the program has been moved from flow.data to sample.data.

Instrumenting (+I/-I)

Although you can use the linker alone to perform PBO, the best optimizations result if you use the compiler as well; this section describes this approach.

To instrument an application (with C, C++, and FORTRAN), compile the source with the +I compiler command line option. This causes the compiler to generate a .o file containing intermediate code, rather than the usual object code. (Intermediate code is a representation of your code that is lower-level than the source code, but higher level than the object code.) A file containing such intermediate code is referred to as an I-SOM file.

After creating an I-SOM file for each source file, the compiler invokes the linker as follows:

  1. In 32-bit mode, instead of using the startup file /usr/ccs/lib/crt0.o, the compiler specifies a special startup file named /opt/langtools/lib/icrt0.o. When building a shared library, the compiler uses /usr/ccs/lib/scrt0.o. In 64-bit mode, the linker automatically adds /usr.css/lib/pa20_64/fdp_init.o or /usr.css/lib/pa20_64/fdp_init_sl.o to the link when detects that -I crt0.o is not changed.

  2. The compiler passes the -I option to the linker, causing it to place instrumentation code in the resulting executable.

You can see how the compiler invokes the linker by specifying the -v option. For example, to instrument the file sample.c, to name the executable sample.inst, to perform level 2 optimizations (the compiler option -O is equivalent to +O2), and to see verbose output (-v):

$ cc -v -o sample.inst +I -O sample.c
/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o \
sample.inst sample.o -I -lc

Notice in the linker command line (starting with /usr/ccs/bin/ld), the application is linked with /opt/langtools/lib/icrt0.o and the -I option is given.

To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in “Specifying a Different flow.data with FLOW_DATA ”.

The Startup File icrt0.o

The icrt0.o startup file uses the atexit system call to register the function that writes out profile data. (For 64-bit mode, the initialization code is in /usr/ccs/lib/pa20_64/fdp_init.0.) That function is called when the application exits.

atexit allows a fixed number of functions to be registered from a user application. Instrumented applications (those linked with -I) will have one less atexit call available. One or more instrumented shared libraries will use a single additional atexit call. Therefore, an instrumented application that contains any number instrumented shared libraries will use two of the available atexit calls.

For details on atexit, see atexit(2).

The -I Linker Option

When invoked with the -I option, the linker instruments all the specified object files. Note that the linker instruments regular object files as well as I-SOM files; however, with regular object files, only procedure call instrumentation is added. With I-SOM files, additional instrumentation is done within procedures.

For instance, suppose you have a regular object file named foo.o created by compiling without the +I option, and you compile a source file bar.c with the +I option and specify foo.o on the compile line:

$ cc -c foo.c
$ cc -v -o foobar -O +I bar.c foo.o
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc

In this case, the linker instruments both bar.o and foo.o. However, since foo.o is not an I-SOM file, only its procedure calls are instrumented; basic blocks within procedures are not instrumented. To instrument foo.c to the same extent, you must compile it with the +I option — for example:

$ cc -v -c +I -O foo.c
/opt/langtools/lbin/cpp foo.c /var/tmp/ctm432
/opt/ansic/lbin/ccom /var/tmp/ctm432 foo.o -O2 -I
$ cc -v -o foobar -O +I bar.c foo.o
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc

A simpler approach would be to compile foo.c and bar.c with a single cc command:

$ cc -v +I -O -o foobar bar.c foo.c
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm352
/opt/ansic/lbin/ccom /var/tmp/ctm352 bar.o -O2 -I
/opt/langtools/lbin/cpp foo.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 foo.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc

Code Generation from I-SOMs

As discussed in “Looking "inside" a Compiler ”, a compiler driver invokes several phases. The last phase before linking is code generation. When using PBO, the compilation process stops at an intermediate code level. The PA-RISC code generation and optimization phase is invoked by the linker. The code generator is /opt/langtools/lbin/ucomp.

NOTE: Since the code generation phase is delayed until link time with PBO, linking can take much longer than usual when using PBO. Compile times are faster than usual, since code generation is not performed.

Profiling

After instrumenting a program, you can run it one or more times to generate profile data, which is ultimately used to perform the optimizations in the final step of PBO.

This section provides information on the following profiling topics:

Choosing Input Data

For best results from PBO, use representative input data when running an instrumented program. Input data that represents rare cases or error conditions is usually not effective for profiling. Run the instrumented program with input data that closely resembles the data in a typical user's environment. Then, the optimizer will focus its efforts on the parts of the program that are critical to performance in the user's environment. You should not have to do a large number of profiling runs before the optimization phase. Usually it is adequate to select a small number of representative input data sets.

The flow.data File

When an instrumented program terminates with the exit(2) system call, special code in the 32-bit icrt0.o startup file or the 64-bit /usr/ccs/lib/pa20_64/fdp_init.o file writes profile data to a file called flow.data in the current working directory. This file contains binary data, which cannot be viewed or updated with a text editor. The flow.data file is not updated when a process terminates without calling exit. That happens, for example, when a process aborts due to an unexpected signal, or when program calls exec(2) to replace itself with another program.

There are also certain non-terminating processes (such as servers, daemons, and operating systems) which never call exit. For these processes, you must programmatically write the profile data to the flow.data file. In order to do so, a process must call a routine called _write_counters(). This routine is defined in the icrt0.o file. A stub routine with the same name is present in the crt0.o file so that the source does not have to change when instrumentation is not being done.

If flow.data does not exist, the program creates it; if flow.data exists, the program updates the profile data.

As an example, suppose you have an instrumented program named prog.inst, and two representative input data files named input_file1 and input_file2. Then the following lines would create a flow.data file:

$ prog.inst < input_file1
$ ls flow.data
flow.data
$ prog.inst < input_file2

The flow.data file includes profile data from both input files.

To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in “Specifying a Different flow.data with FLOW_DATA ”.

Storing Profile Information for Multiple Programs

A single flow.data file can store information for multiple programs. This allows an instrumented program to spawn other instrumented programs, all of which share the same flow.data file.

To allow multiple programs to save their data in the same flow.data file, a program's profile data is uniquely identified by the executable's basename (see basename(1)), the executable's file size, and the time the executable was last modified.

Instead of using the executable's basename, you can specify a basename by setting the environment variable PBO_PGM_PATH. This is useful when a number of programs are actually linked to the same instrumented executables.

For example, consider profiling the ls, lsf and lsx commands. (lsx is ls with the -x option and lsf is ls with the -F option.) Since the three commands could be linked to the same instrumented executables, the developer might want to collect profile data under a single basename by setting PBO_PGM_PATH=ls. If PBO_PGM_PATH=ls were not set, profile data would be saved under the ls, the lsf, and the lsx basenames.

When an instrumented program begins execution, it checks whether the basename, size, and time-stamp match those in the existing flow.data file. If the basename matches but the size or time-stamp does not match, that probably means that the program has been relinked since it last created profile data. In this case, the following error message will be issued:

program: Can't update counters.  Profile data exists
but does not correspond to this executable. Exit.

You can fix this problem any one of these ways:

  • Remove or rename the existing flow.data file.

  • Run the instrumented program in a different working directory.

  • Set the FLOW_DATA environment variable so that profile data is written to a file other than flow.data.

  • Rename the instrumented program.

Sharing the flow.data File Among Multiple Processes

A flow.data file can potentially be accessed by several processes at the same time. For example, this could happen when you run more than one instrumented program at the same time in the same directory, or when profiling one program while linking another with -P.

Such asynchronous access to the file could potentially corrupt the data. To prevent simultaneous access to the flow.data file in a particular directory, a lock file called flow.lock is used. Instrumented programs that need to update the flow.data file and linker processes that need to read it must first obtain access to the lock file. Only one process can hold the lock at any time. As long as the flow.data file is being actively read and written, a process will wait for the lock to become available.

A program that terminates abnormally may leave the flow.data file inactive but locked. A process that tries to access an inactive but locked flow.data file gives up after a short period of time. In such cases, you may need to remove the flow.lock file.

If an instrumented program fails to obtain the database lock, it writes the profile data to a temporary file and displays a warning message containing the name of the file. You could then use the +df option along with the +P option while optimizing, to specify the name of the temporary file instead of the flow.data file.

If the linker fails to obtain the lock, it displays an error message and terminates. In such cases, wait until all active processes that are reading or writing a profile database file in that directory have completed. If no such processes exist, remove the flow.lock file.

Forking an Instrumented Application

When instrumenting an application that creates a copy of itself with the fork system call, you must ensure that the child process calls a special function named _clear_counters(), which clears all internal profile data. If you don't do this, the child process inherits the parent's profile data, updating the data as it executes, resulting in inaccurate (exaggerated) profile data when the child terminates. The following code segment shows a valid way to call _clear_counters:

  if ((pid = fork()) == 0) /* this is the child process */
{
_clear_counters(); /* reset profile data for child */
. . . /* other code for the child */
}

The function _clear_counters is defined in icrt0.o. It is also defined as a stub (an empty function that does nothing) in crt0.o. This allows you to use the same source code without modification in the instrumented and un-instrumented versions of the program.

Optimizing Based on Profile Data (+P/-P)

The final step in PBO is optimizing a program using profile data created in the profiling phase. To do this, rebuild the program with the +P compiler option. As with the +I option, the +P option causes the compiler to generate an I-SOM .o file, rather than the usual object code, for each source file.

Note that it is not really necessary to recompile the source files; you could, instead, specify the I-SOM .o files that were created during the instrumentation phase. For instance, suppose you have already created an I-SOM file named foo.o from foo.c using the +I compiler option; then the following commands are equivalent in effect:

cc +P foo.c
cc +P foo.o

Both commands invoke the linker, but the second command doesn't compile before invoking the linker.

The -P Linker Option

After creating an I-SOM file for each source file, the compiler driver invokes the linker with the -P option, causing the linker to optimize all the .o files. As with the +I option, the driver uses /opt/langtools/lbin/ucomp to generate code and perform various optimizations.

To see how the compiler invokes the linker, specify the -v option when compiling. For instance, suppose you have instrumented prog.c and gathered profile data into flow.data. The following example shows how the compiler driver invokes the linker when +P is specified:

$ cc -o prog -v +P prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
prog.o -P -lc

Notice how the program is now linked with /usr/ccs/lib/crt0.o instead of /opt/langtools/lib/icrt0.o because the profiling code is no longer needed.

Using The flow.data File

By default, the code generator and linker look for the flow.data file in the current working directory. In other words, the flow.data file created during the profiling phase should be located in the directory where you relink the program.

Specifying a Different flow.data File with +df

What if you want to use a flow.data file from a different directory than where you are linking? Or what if you have renamed the flow.data file — for example, if you have multiple flow.data files created for different input sets? The +df option allows you to override the default +P behavior of using the file flow.data in the current directory. The compiler passes this option directly to the linker.

For example, suppose after collecting profile data, you decide to rename flow.data to prog.prf. You could then use the +df option as follows:

$ cc -v -o prog +P +df prog.prf prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
+df prog.prf prog.o -P -lc

The +df option overrides the effects of the FLOW_DATA environment variable.

Specifying a Different flow.data with FLOW_DATA

The FLOW_DATA environment variable provides another way to override the default flow.data file name and location. If set, this variable defines an alternate file name for the profile data file.

For example, to use the file /home/adam/projX/prog.data instead of flow.data, set FLOW_DATA:

$ FLOW_DATA=/home/adam/projX/prog.data
$ export FLOW_DATA Bourne and Korn shell $ setenv FLOW_DATA /home/adam/projX/prog.data C shell

Interaction between FLOW_DATA and +df

If an application is linked with +df and -P, the FLOW_DATA environment variable is ignored. In other words, +df overrides the effects of FLOW_DATA.

Specifying a Different Program Name (+pgm)

When retrieving a program's profile data from the flow.data file, the linker uses the program's basename as a lookup key. For instance, if a program were compiled as follows, the linker would look for the profile data under the name foobar:

$ cc -v -o foobar +P foo.o bar.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o foobar \
foo.o bar.o -P -lc

This works fine as long as the name of the program is the same during the instrumentation and optimization phases. But what if the name of the instrumented program is not the same as name of the final optimized program? For example, what if you want the name of the instrumented application to be different from the optimized application, so you use the following compiler commands?

$ cc -O +I -o prog.inst prog.c      Instrument prog.inst. $ prog.inst < input_file1           Profile it, storing the data                                                                              under the name prog.inst.
$ prog.inst < input_file2
$ cc +P -o prog.opt prog.c Optimize it, but name it prog.opt.

The linker would be unable to find the program name prog.opt in the flow.data file and would issue the error message:

No profile data found for the program prog.opt in flow.data

To get around this problem, the compilers and linker provide the +pgm name option, which allows you to specify a program name to look for in the flow.data file. For instance, to make the above example work properly, you would include +pgm prog.inst on the final compile line:

$ cc +P -o prog.opt +pgm prog.inst prog.c

Like the +df option, the +pgm option is passed directly to the linker.

Selecting an Optimization Level with PBO

When -P is specified, the code generator and linker perform profile-based optimizations on any I-SOM or regular object files found on the linker command line. In addition, optimizations will be performed according to the optimization level you specified with a compiler option when you instrumented the application. Briefly, the compiler optimization options are:

+O0

Minimal optimization. This is the default.

+O1

Basic block level optimization.

+O2

Full optimization within each procedure in a file. (Can also be invoked as -O.)

+O3

Full optimization across all procedures in an object file. Includes subprogram inlining.

+O4

Full optimization across entire application, performed at link time. (Invokes ld +Ofastaccess +Oprocelim.) Includes inlining across multiple files.

NOTE: Be aware that +O3 and +O4 are incompatible with symbolic debugging. The only compiler optimization levels that allow for symbolic debugging are +O2 and lower.

For more detailed information on compiler optimization levels, see your compiler documentation.

PBO has the greatest impact when it is combined with level 2 or greater optimizations. For instance, this compile command combines level 2 optimization with PBO (note that the compiler options +O2 and -O are equivalent):

$ cc -v -O +I -c prog.c
/opt/langtools/lbin/cpp prog.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 prog.o -O2 -I
$ cc -v -O +I -o prog prog.o
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o prog \
prog.o -I -lc

The optimizations are performed along with instrumentation. However, profile-based optimizations are not performed until you compile later with +P:

$ cc -v +P -o prog prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main \
-o prog prog.o -P -lc

Using PBO to Optimize Shared Libraries

Beginning with the HP-UX 10.0 release, the -I linker option can be used with -b to build a shared library with instrumented code. Also, the -P, +df, and +pgm command-line options are compatible with the -b option.

To profile shared libraries, you must set the environment variable SHLIB_FLOW_DATA to the file that receives profile data. Unlike FLOW_DATA, SHLIB_FLOW_DATA has no default output file. If SHLIB_FLOW_DATA is not set, profile data is not collected. This allows you to activate or suspend the profiling of instrumented shared libraries.

Note that you could set SHLIB_FLOW_DATA to flow.data which is the same file as the default setting for FLOW_DATA. But, again, profile data will not be collected from shared libraries unless you explicitly set SHLIB_FLOW_DATA to some output file.

The following is a simple example for instrumenting, profiling, and optimizing a shared library:

$ cc +z +I -c -O libcode.c             Create I-SOM files. $ ld -b -I libcode.o -o mylib.inst.sl  Create instrumented sl. $ cc main.c mylib.inst.sl              Creat executablea.outile. $ export SHLIB_FLOW_DATA=./flow.data   Specify output file for                                        profile data
$ a.out < input_file Run instrumented executable with representative input data
$ ld -b -P +pgm mylib.inst.sl \
libcode.o -o mylib.sl Perform PBO.

Note that the name used in the database will be the output pathname specified when the instrumented library is linked (mylib.inst.sl in the example above), regardless of how the library might be moved or renamed after it is created.

Using PBO with ld -r

Beginning with the HP-UX 10.0 release, you can take greater advantage of PBO on merged object files created with the -r linker option.

Briefly, ld -r combines multiple .o files into a single .o file. It is often used in large product builds to combine objects into more manageable units. It is also often used in combination with the linker -h option to hide symbols that may conflict with other subsystems in a large application. (See “Hiding Symbols with -h” for more information on ld -h.)

In HP-UX 10.0, the subspaces in the merged .o file produced by ld -r are relocatable which allows for greater optimization.

The following is a simple example of using PBO with ld -r:

$ cc +I -c file1.c file2.c             Create individual I-SOM files $ ld -r -I -o reloc.o file1.o file2.o  Build relocatable, merged file. $ cc +I -o a.out reloc.o           Create instrumented executable file. $ a.out < input_file               Run instrumented executable with                                     
                                      representative input data.
$ ld -r -P +pgm a.out -o reloc.o \
file1.o file2.o Rebuild relocatable file for PBO. $ cc +P -o a.out reloc.o Perform PBO on the final executable file.

Notice in the example above, that the +pgm option was necessary because the output file name differs from the instrumented program file name.

NOTE: If you are using -r and C++ templates, check "Known Limitations" in the HP C++ Release Notes for possible limitations.

Restrictions and Limitations of PBO

This section describes restrictions and limitations you should be aware of when using Profile-Based Optimization.

NOTE:

PBO calls malloc() during the instrumentation (+I) phase. If you replace libc malloc(3C) calls with your own version of malloc(), use the same parameter list (data types, order, number, and meaning of parameters) as the HP version. (For information on malloc(), see malloc(3C).)

Temporary Files

The linker does not modify I-SOM files. Rather, it compiles, instruments, and optimizes the code, placing the resulting temporary object file in a directory specified by the TMPDIR environment variable. If PBO fails due to inadequate disk space, try freeing up space on the disk that contains the $TMPDIR directory. Or, set TMPDIR to a directory on a disk with more free space.

Source Code Changes and PBO

To avoid the potential problems described below, PBO should only be used during the final stages of application development and performance tuning, when source code changes are the least likely to be made. Whenever possible, an application should be re-profiled after source code changes have been made.

What happens if you attempt to optimize a program using profile data that is older than the source files? For example, this could occur if you change source code and recompile with +P, but don't gather new profile data by re-instrumenting the code.

In that sequence of events, optimizations will still be performed. However, full profile-based optimizations will be performed only on those procedures whose internal structure has not changed since the profile data was gathered. For procedures whose structure has changed, the following warning message is generated:

ucomp warning: Code for name changed since profile
database file flow.data built. Profile data for name
ignored. Consider rebuilding flow.data.

Note that it is possible to make a source code change that does not affect the control flow structure of a procedure, but which does significantly affect the profiling data generated for the program. In other words, a very small source code change can dramatically affect the paths through the program that are most likely to be taken. For example, changing the value of a program constant that is used as a parameter or loop limit value might have this effect. If the user does not re-profile the application after making source code changes, the profile data in the database will not reflect the effects of those changes. Consequently, the transformations made by the optimizer could degrade the performance of the application.

Profile-Based Optimization (PBO) and High-Level Optimization (HLO)

High-level optimization, or HLO, consists of a number of optimizations, including inlining, that are automatically invoked with the +O3 and +O4 compiler options. (Inlining is an optimization that replaces each call to a routine with a copy of the routine's actual code.) +O3 performs HLO on each module while +O4 performs HLO over the entire program and removes unnecessary ADDIL instructions. Since HLO distorts profile data, it is suppressed during the instrumentation phases of PBO.

When +I is specified along with +O3 or +O4, an I-SOM file is generated. However, HLO is not performed during I-SOM generation. When the I-SOM file is linked, using the +P option to do PBO, HLO is performed, taking advantage of the profile data.

Example

The following example illustrates high-level optimization with PBO:

$ cc +I +O3 -c file.c  Create I-SOM for instrumentation.
$ cc +I +O3 file.o     Link with instrumentation.
$ a.out < input_file  Run instrumented executable with representative input data.
$ cc +P +O3 file.o     Perform PBO and HLO.

Replace +O3 with +O4 in the above example to get HLO over the entire program and ADDIL elimination. (You may see a warning when using +O4 at instrumentation indicating that the +O4 option is being ignored. You can ignore this warning.)

I-SOM File Restrictions

For the most part, there are not many noticeable differences between I-SOM files and ordinary object files. Exceptions are noted below.

ld

Linking object files compiled with the +I or +P option takes much longer than linking ordinary object files. This is because in addition to the work that the linker already does, the code generator must be run on the intermediate code in the I-SOM files. On the other hand, the time to compile a file with +I or +P is relatively fast since code generation is delayed until link time.

All options to ld should work normally with I-SOM files with the following exceptions:

-r

The -r option works with both -I and -P. However, it produces an object file, not an I-SOM file. In 64-bit mode, use -I, -P, or the +nosectionmerge option on a -r linker command to allow procedures to be positioned independently. Without these options, a -r link merges procedures into a single section.

-s

Do not use this option with -I. However, there is no problem using this option with -P.

-G

Do not use this option with -I. There is no problem using this option with -P.

-A

Do not use this option with -I or -P.

-N

Do not use this option with -I or -P.

nm

The nm command works on I-SOM files. However, since code generation has not yet been performed, some of the imported symbols that might appear in an ordinary relocatable object file will not appear in an I-SOM file.

ar

I-SOM files can be manipulated with ar in exactly the same way that ordinary relocatable files can be.

strip

Do not run strip on files compiled with +I or +P. Doing so results in an object file that is essentially empty.

Compiler Options

Except as noted below, all cc, CC, and f77 compiler options work as expected when specified with +I or +P:

-g

This option is incompatible with +I and +P.

-G

This option is incompatible with +I, but compatible with +P (as long as the insertion of the gprof library calls does not affect the control flow graph structure of the procedures.)

-p

This option is incompatible with +I option, but is compatible with +P (as long as the insertion of the prof code does not affect the control flow graph structure of the procedures.)

-s

You should not use this option together with +I. Doing so will result in an object file that is essentially empty.

-S

This option is incompatible with +I and +P options because assembly code is not generated from the compiler in these situations. Currently, it is not possible to get assembly code listings of code generated by +I and +P.

-y/+y

The same restrictions apply to these options that were mentioned for -g above.

+o

This option is incompatible with +I and +P. Currently, you cannot get code offset listings for code generated by +I and +P.

Compatibility with 9.0 PBO

PBO is largely compatible between the 9.0 and 10.0 releases.

I-SOM files created under 9.0 are completely acceptable in the 10.0 environment.

However, it is advantageous to re-profile programs under 10.0 in order to achieve improved optimization. Although you can use profile data in flow.data files created under 9.0, the resulting optimization will not take advantage of 10.0 enhancements. In addition, a warning is generated stating that the profile data is from a previous release. See the section called “Profiling ” in this chapter for more information.

See the section called “Profiling ” for more information about the warning generated for profile data generated from a previous release.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1997 Hewlett-Packard Development Company, L.P.