| United States-English |
|
|
|
![]() |
HP-UX Linker and Libraries User's Guide: HP 9000 Computers > Chapter 8 Ways to Improve Performance Profile-Based Optimization |
|
In profile-based optimization (PBO), the compiler and linker work together to optimize an application based on profile data obtained from running the application on a typical input data set. For instance, if certain procedures call each other frequently, the linker can place them close together in the a.out file, resulting in fewer instruction cache misses, TLB misses, and memory page faults when the program runs. Similar optimizations can be done at the basic block levels of a procedure. Profile data is also used by the compiler for other general tasks, such as code scheduling and register allocation.
PBO should be the last level of optimization you use when building an application. As with other optimizations, it should be performed after an application has been completely debugged. Most applications will benefit from PBO. The two types of applications that will benefit the most from PBO are:
Of course, the best way to determine whether PBO will improve an application's performance is to try it.
Profile-based optimization involves these steps:
Suppose you want to apply PBO to an application called sample. The application is built from a C source file sample.c. Discussed below are the steps involved in optimizing the application. Step 1 Instrumentation First, compile the application for instrumentation and level 2 optimization:
At this point, you have an instrumented program called sample.inst. Step 2 Profile Assume you have two representative input files to use for profiling, input.file1 and input.file2. Now execute the following three commands:
The first invocation of sample.inst creates the flow.data file and places an entry for that executable file in the data file. The second invocation increments the counters for sample.inst in the flow.data file. The third command moves the flow.data file to a file named sample.data. Step 3 Optimize To perform profile based optimizations on this application, relink the program as follows:
Note that it was not necessary to recompile the source file. The +pgm option was used because the executable name used during instrumentation, sample.inst, does not match the current output file name, sample.opt. The +df option is necessary because the profile database file for the program has been moved from flow.data to sample.data. Although you can use the linker alone to perform PBO, the best optimizations result if you use the compiler as well; this section describes this approach. To instrument an application (with C, C++, and FORTRAN), compile the source with the +I compiler command line option. This causes the compiler to generate a .o file containing intermediate code, rather than the usual object code. (Intermediate code is a representation of your code that is lower-level than the source code, but higher level than the object code.) A file containing such intermediate code is referred to as an I-SOM file. After creating an I-SOM file for each source file, the compiler invokes the linker as follows:
You can see how the compiler invokes the linker by specifying the -v option. For example, to instrument the file sample.c, to name the executable sample.inst, to perform level 2 optimizations (the compiler option -O is equivalent to +O2), and to see verbose output (-v):
Notice in the linker command line (starting with /usr/ccs/bin/ld), the application is linked with /opt/langtools/lib/icrt0.o and the -I option is given. To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in “Specifying a Different flow.data with FLOW_DATA ”. The icrt0.o startup file uses the atexit system call to register the function that writes out profile data. (For 64-bit mode, the initialization code is in /usr/ccs/lib/pa20_64/fdp_init.0.) That function is called when the application exits. atexit allows a fixed number of functions to be registered from a user application. Instrumented applications (those linked with -I) will have one less atexit call available. One or more instrumented shared libraries will use a single additional atexit call. Therefore, an instrumented application that contains any number instrumented shared libraries will use two of the available atexit calls. For details on atexit, see atexit(2). When invoked with the -I option, the linker instruments all the specified object files. Note that the linker instruments regular object files as well as I-SOM files; however, with regular object files, only procedure call instrumentation is added. With I-SOM files, additional instrumentation is done within procedures. For instance, suppose you have a regular object file named foo.o created by compiling without the +I option, and you compile a source file bar.c with the +I option and specify foo.o on the compile line:
In this case, the linker instruments both bar.o and foo.o. However, since foo.o is not an I-SOM file, only its procedure calls are instrumented; basic blocks within procedures are not instrumented. To instrument foo.c to the same extent, you must compile it with the +I option — for example:
A simpler approach would be to compile foo.c and bar.c with a single cc command:
As discussed in “Looking "inside" a Compiler ”, a compiler driver invokes several phases. The last phase before linking is code generation. When using PBO, the compilation process stops at an intermediate code level. The PA-RISC code generation and optimization phase is invoked by the linker. The code generator is /opt/langtools/lbin/ucomp.
After instrumenting a program, you can run it one or more times to generate profile data, which is ultimately used to perform the optimizations in the final step of PBO. This section provides information on the following profiling topics: For best results from PBO, use representative input data when running an instrumented program. Input data that represents rare cases or error conditions is usually not effective for profiling. Run the instrumented program with input data that closely resembles the data in a typical user's environment. Then, the optimizer will focus its efforts on the parts of the program that are critical to performance in the user's environment. You should not have to do a large number of profiling runs before the optimization phase. Usually it is adequate to select a small number of representative input data sets. When an instrumented program terminates with the exit(2) system call, special code in the 32-bit icrt0.o startup file or the 64-bit /usr/ccs/lib/pa20_64/fdp_init.o file writes profile data to a file called flow.data in the current working directory. This file contains binary data, which cannot be viewed or updated with a text editor. The flow.data file is not updated when a process terminates without calling exit. That happens, for example, when a process aborts due to an unexpected signal, or when program calls exec(2) to replace itself with another program. There are also certain non-terminating processes (such as servers, daemons, and operating systems) which never call exit. For these processes, you must programmatically write the profile data to the flow.data file. In order to do so, a process must call a routine called _write_counters(). This routine is defined in the icrt0.o file. A stub routine with the same name is present in the crt0.o file so that the source does not have to change when instrumentation is not being done. If flow.data does not exist, the program creates it; if flow.data exists, the program updates the profile data. As an example, suppose you have an instrumented program named prog.inst, and two representative input data files named input_file1 and input_file2. Then the following lines would create a flow.data file:
The flow.data file includes profile data from both input files. To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in “Specifying a Different flow.data with FLOW_DATA ”. A single flow.data file can store information for multiple programs. This allows an instrumented program to spawn other instrumented programs, all of which share the same flow.data file. To allow multiple programs to save their data in the same flow.data file, a program's profile data is uniquely identified by the executable's basename (see basename(1)), the executable's file size, and the time the executable was last modified. Instead of using the executable's basename, you can specify a basename by setting the environment variable PBO_PGM_PATH. This is useful when a number of programs are actually linked to the same instrumented executables. For example, consider profiling the ls, lsf and lsx commands. (lsx is ls with the -x option and lsf is ls with the -F option.) Since the three commands could be linked to the same instrumented executables, the developer might want to collect profile data under a single basename by setting PBO_PGM_PATH=ls. If PBO_PGM_PATH=ls were not set, profile data would be saved under the ls, the lsf, and the lsx basenames. When an instrumented program begins execution, it checks whether the basename, size, and time-stamp match those in the existing flow.data file. If the basename matches but the size or time-stamp does not match, that probably means that the program has been relinked since it last created profile data. In this case, the following error message will be issued:
You can fix this problem any one of these ways:
A flow.data file can potentially be accessed by several processes at the same time. For example, this could happen when you run more than one instrumented program at the same time in the same directory, or when profiling one program while linking another with -P. Such asynchronous access to the file could potentially corrupt the data. To prevent simultaneous access to the flow.data file in a particular directory, a lock file called flow.lock is used. Instrumented programs that need to update the flow.data file and linker processes that need to read it must first obtain access to the lock file. Only one process can hold the lock at any time. As long as the flow.data file is being actively read and written, a process will wait for the lock to become available. A program that terminates abnormally may leave the flow.data file inactive but locked. A process that tries to access an inactive but locked flow.data file gives up after a short period of time. In such cases, you may need to remove the flow.lock file. If an instrumented program fails to obtain the database lock, it writes the profile data to a temporary file and displays a warning message containing the name of the file. You could then use the +df option along with the +P option while optimizing, to specify the name of the temporary file instead of the flow.data file. If the linker fails to obtain the lock, it displays an error message and terminates. In such cases, wait until all active processes that are reading or writing a profile database file in that directory have completed. If no such processes exist, remove the flow.lock file. When instrumenting an application that creates a copy of itself with the fork system call, you must ensure that the child process calls a special function named _clear_counters(), which clears all internal profile data. If you don't do this, the child process inherits the parent's profile data, updating the data as it executes, resulting in inaccurate (exaggerated) profile data when the child terminates. The following code segment shows a valid way to call _clear_counters:
The function _clear_counters is defined in icrt0.o. It is also defined as a stub (an empty function that does nothing) in crt0.o. This allows you to use the same source code without modification in the instrumented and un-instrumented versions of the program. The final step in PBO is optimizing a program using profile data created in the profiling phase. To do this, rebuild the program with the +P compiler option. As with the +I option, the +P option causes the compiler to generate an I-SOM .o file, rather than the usual object code, for each source file. Note that it is not really necessary to recompile the source files; you could, instead, specify the I-SOM .o files that were created during the instrumentation phase. For instance, suppose you have already created an I-SOM file named foo.o from foo.c using the +I compiler option; then the following commands are equivalent in effect:
Both commands invoke the linker, but the second command doesn't compile before invoking the linker. After creating an I-SOM file for each source file, the compiler driver invokes the linker with the -P option, causing the linker to optimize all the .o files. As with the +I option, the driver uses /opt/langtools/lbin/ucomp to generate code and perform various optimizations. To see how the compiler invokes the linker, specify the -v option when compiling. For instance, suppose you have instrumented prog.c and gathered profile data into flow.data. The following example shows how the compiler driver invokes the linker when +P is specified:
Notice how the program is now linked with /usr/ccs/lib/crt0.o instead of /opt/langtools/lib/icrt0.o because the profiling code is no longer needed. By default, the code generator and linker look for the flow.data file in the current working directory. In other words, the flow.data file created during the profiling phase should be located in the directory where you relink the program. What if you want to use a flow.data file from a different directory than where you are linking? Or what if you have renamed the flow.data file — for example, if you have multiple flow.data files created for different input sets? The +df option allows you to override the default +P behavior of using the file flow.data in the current directory. The compiler passes this option directly to the linker. For example, suppose after collecting profile data, you decide to rename flow.data to prog.prf. You could then use the +df option as follows:
The +df option overrides the effects of the FLOW_DATA environment variable. The FLOW_DATA environment variable provides another way to override the default flow.data file name and location. If set, this variable defines an alternate file name for the profile data file. For example, to use the file /home/adam/projX/prog.data instead of flow.data, set FLOW_DATA:
If an application is linked with +df and -P, the FLOW_DATA environment variable is ignored. In other words, +df overrides the effects of FLOW_DATA. When retrieving a program's profile data from the flow.data file, the linker uses the program's basename as a lookup key. For instance, if a program were compiled as follows, the linker would look for the profile data under the name foobar:
This works fine as long as the name of the program is the same during the instrumentation and optimization phases. But what if the name of the instrumented program is not the same as name of the final optimized program? For example, what if you want the name of the instrumented application to be different from the optimized application, so you use the following compiler commands?
The linker would be unable to find the program name prog.opt in the flow.data file and would issue the error message:
To get around this problem, the compilers and linker provide the +pgm name option, which allows you to specify a program name to look for in the flow.data file. For instance, to make the above example work properly, you would include +pgm prog.inst on the final compile line:
Like the +df option, the +pgm option is passed directly to the linker. When -P is specified, the code generator and linker perform profile-based optimizations on any I-SOM or regular object files found on the linker command line. In addition, optimizations will be performed according to the optimization level you specified with a compiler option when you instrumented the application. Briefly, the compiler optimization options are:
PBO has the greatest impact when it is combined with level 2 or greater optimizations. For instance, this compile command combines level 2 optimization with PBO (note that the compiler options +O2 and -O are equivalent):
The optimizations are performed along with instrumentation. However, profile-based optimizations are not performed until you compile later with +P:
Beginning with the HP-UX 10.0 release, the -I linker option can be used with -b to build a shared library with instrumented code. Also, the -P, +df, and +pgm command-line options are compatible with the -b option. To profile shared libraries, you must set the environment variable SHLIB_FLOW_DATA to the file that receives profile data. Unlike FLOW_DATA, SHLIB_FLOW_DATA has no default output file. If SHLIB_FLOW_DATA is not set, profile data is not collected. This allows you to activate or suspend the profiling of instrumented shared libraries. Note that you could set SHLIB_FLOW_DATA to flow.data which is the same file as the default setting for FLOW_DATA. But, again, profile data will not be collected from shared libraries unless you explicitly set SHLIB_FLOW_DATA to some output file. The following is a simple example for instrumenting, profiling, and optimizing a shared library:
Note that the name used in the database will be the output pathname specified when the instrumented library is linked (mylib.inst.sl in the example above), regardless of how the library might be moved or renamed after it is created. Beginning with the HP-UX 10.0 release, you can take greater advantage of PBO on merged object files created with the -r linker option. Briefly, ld -r combines multiple .o files into a single .o file. It is often used in large product builds to combine objects into more manageable units. It is also often used in combination with the linker -h option to hide symbols that may conflict with other subsystems in a large application. (See “Hiding Symbols with -h” for more information on ld -h.) In HP-UX 10.0, the subspaces in the merged .o file produced by ld -r are relocatable which allows for greater optimization. The following is a simple example of using PBO with ld -r:
Notice in the example above, that the +pgm option was necessary because the output file name differs from the instrumented program file name.
This section describes restrictions and limitations you should be aware of when using Profile-Based Optimization. The linker does not modify I-SOM files. Rather, it compiles, instruments, and optimizes the code, placing the resulting temporary object file in a directory specified by the TMPDIR environment variable. If PBO fails due to inadequate disk space, try freeing up space on the disk that contains the $TMPDIR directory. Or, set TMPDIR to a directory on a disk with more free space. To avoid the potential problems described below, PBO should only be used during the final stages of application development and performance tuning, when source code changes are the least likely to be made. Whenever possible, an application should be re-profiled after source code changes have been made. What happens if you attempt to optimize a program using profile data that is older than the source files? For example, this could occur if you change source code and recompile with +P, but don't gather new profile data by re-instrumenting the code. In that sequence of events, optimizations will still be performed. However, full profile-based optimizations will be performed only on those procedures whose internal structure has not changed since the profile data was gathered. For procedures whose structure has changed, the following warning message is generated:
Note that it is possible to make a source code change that does not affect the control flow structure of a procedure, but which does significantly affect the profiling data generated for the program. In other words, a very small source code change can dramatically affect the paths through the program that are most likely to be taken. For example, changing the value of a program constant that is used as a parameter or loop limit value might have this effect. If the user does not re-profile the application after making source code changes, the profile data in the database will not reflect the effects of those changes. Consequently, the transformations made by the optimizer could degrade the performance of the application. High-level optimization, or HLO, consists of a number of optimizations, including inlining, that are automatically invoked with the +O3 and +O4 compiler options. (Inlining is an optimization that replaces each call to a routine with a copy of the routine's actual code.) +O3 performs HLO on each module while +O4 performs HLO over the entire program and removes unnecessary ADDIL instructions. Since HLO distorts profile data, it is suppressed during the instrumentation phases of PBO. When +I is specified along with +O3 or +O4, an I-SOM file is generated. However, HLO is not performed during I-SOM generation. When the I-SOM file is linked, using the +P option to do PBO, HLO is performed, taking advantage of the profile data. The following example illustrates high-level optimization with PBO:
Replace +O3 with +O4 in the above example to get HLO over the entire program and ADDIL elimination. (You may see a warning when using +O4 at instrumentation indicating that the +O4 option is being ignored. You can ignore this warning.) For the most part, there are not many noticeable differences between I-SOM files and ordinary object files. Exceptions are noted below. Linking object files compiled with the +I or +P option takes much longer than linking ordinary object files. This is because in addition to the work that the linker already does, the code generator must be run on the intermediate code in the I-SOM files. On the other hand, the time to compile a file with +I or +P is relatively fast since code generation is delayed until link time. All options to ld should work normally with I-SOM files with the following exceptions:
The nm command works on I-SOM files. However, since code generation has not yet been performed, some of the imported symbols that might appear in an ordinary relocatable object file will not appear in an I-SOM file. I-SOM files can be manipulated with ar in exactly the same way that ordinary relocatable files can be. Do not run strip on files compiled with +I or +P. Doing so results in an object file that is essentially empty. Except as noted below, all cc, CC, and f77 compiler options work as expected when specified with +I or +P:
PBO is largely compatible between the 9.0 and 10.0 releases. I-SOM files created under 9.0 are completely acceptable in the 10.0 environment. However, it is advantageous to re-profile programs under 10.0 in order to achieve improved optimization. Although you can use profile data in flow.data files created under 9.0, the resulting optimization will not take advantage of 10.0 enhancements. In addition, a warning is generated stating that the profile data is from a previous release. See the section called “Profiling ” in this chapter for more information. See the section called “Profiling ” for more information about the warning generated for profile data generated from a previous release. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||