| United States-English |
|
|
|
![]() |
HP-UX Floating-Point Guide: HP 9000 Computers > Chapter 7 Performance TuningInefficient Code |
|
The HP-UX compilers are highly optimizing and generally produce extremely efficient code. However, you can control the degree of efficiency and the types of optimizations with compiler options and directives. Particularly important from a performance standpoint are the compiler options that do the following:
The following sections describe each of these options. Many of the options are available both as command-line options and as directives or pragmas that you can place in your source code. For more information and specific syntax, refer to the appropriate HP language reference manual. If the compiler generates inefficient code even when you use the appropriate options, you may choose to write parts of your program in assembly language. “Writing Routines in Assembly Language” describes the advantages and disadvantages of this choice. For a thorough discussion of optimization on HP 9000 systems, see the HP PA-RISC Compiler Optimization Technology White Paper. See the appropriate HP language manual for additional information. The most important compiler option affecting efficiency is the optimization option, +O, which allows you to optimize your program in several different ways:
In general, the higher the optimization level, the more efficient the code. In performing optimizations, the compiler often rearranges code and makes assumptions about the way variables will be used in other modules. There is some risk, therefore, in choosing a high optimization level, since the compiler may make some invalid assumptions that can cause code to run more slowly. This is particularly true if your code makes frequent use of pointers. It is always a good idea to compile a program at different optimization levels and compare the results to make sure that the optimizations are not affecting either the performance or the results. See “Compiler Behavior and Compiler Version” and “Compiler Options” for information about how compiler optimizations can affect program results. The following specific optimizations are particularly relevant to floating-point programs. Most of them are available at optimization levels 2, 3, and 4.
All HP 9000 compilers support the +DA option, which specifies a particular target architecture type, either PA-RISC 1.1 or PA-RISC 2.0. Use of this option causes the compiler to produce architecture-specific instructions and calls to special architecture-specific run-time libraries. Specifying the architecture type of the systems on which your code will run will probably improve the performance of your code if it makes substantial use of floating-point arithmetic or math library calls. See “Selecting Different Versions of the Math Libraries”, “Architecture Type of Run-Time System”, and “BLAS Library Versions” for more information. Use of the +DA2.0 option to generate PA2.0 code will improve the performance of your application even more if the source provides opportunities for the compiler to generate FMA (fused multiply-add) instructions (see “Architecture Type of Run-Time System” for details). For example, if two statements like
and
are separated by intervening statements in your program, you may want to place them one right after the other or to combine them into
This kind of rearrangement will be most effective if done within loops. The +DS option also has a significant effect on performance, because it specifies an architecture-specific instruction scheduler. If your code must be portable across all HP 9000 architectures, you must compile with +DA1.1, but you may compile with either +DS1.1 or +DS2.0. Use +DS2.0 if you want to achieve the best possible performance on PA2.0 systems. See the appropriate HP language reference manual for more information about this option. All HP 9000 compilers allow you to include debugging information in the object file at optimization levels 0, 1, and 2. Debugging information increases the size of the object code. The debugging option is extremely useful during program development, but for the final product you should compile without it. By default, compilers produce absolute code for HP 9000 systems. You can produce position-independent code (PIC) for use in building shared libraries. In general, absolute code is faster than PIC because addressing calculations are simpler and shorter. Consult Programming on HP-UX for more information about absolute and position-independent code. See “Shared Libraries versus Archive Libraries” for more information on the performance impact of shared libraries. HP C, HP C++, HP FORTRAN/9000, and HP Pascal support profile-based optimization (PBO) on HP 9000 systems. PBO can improve the performance of programs that are branch-intensive and that exhibit poor instruction memory locality. Although these tend not to be issues in floating-point-intensive applications, if you suspect that they may be degrading the performance of your program, you can use PBO to minimize their impact on your program. Under PBO, the compiler and linker work to optimize the executable file, using profile data for a typical data set to produce an executable file that will result in fewer instruction cache misses, Translation Lookaside Buffer (TLB) misses, and memory page faults. For information about PBO, see the HP-UX Linker and Libraries Online User Guide and the appropriate compiler documentation. HP Fortran 90 provides an option, +save, that forces static storage for all local variables and that forces the compiler to initialize all uninitialized static variables to zero. HP FORTRAN/9000 provides an equivalent option, -K; the +e option also automatically saves all local variables, if possible. Use these options judiciously. They are costly from a performance standpoint and also from a software engineering perspective because they change the semantics of an entire module rather than altering specific problem areas. The optimization option +Oinitcheck performs initialization in a more selective way that has less impact on the performance of your program. Use this option in Fortran 90 programs. See the f90(1) or f77(1) man page for details. See “Static Variables” for more information about static data. If you have compiled with all of the correct compiler options and you are still not satisfied with the program's performance, you may want to examine the generated code to see exactly what is happening. To get an expanded listing, specify the -S option. You can also code parts of your program directly in assembly language. Assembly language is useful if performance is critical and portability is not. When deciding whether to write something in assembly language, keep in mind that the HP 9000 compilers are highly optimizing. If the code section is large, the compiler can probably generate code as good as or better than an assembly language program. Good candidates for assembly language are short, frequently called routines. However, using the +Oinline compiler option may improve the performance of these routines enough to make it unnecessary to rewrite them in assembly language. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||