| United States-English |
|
|
|
![]() |
HP C/HP-UX Programmer's Guide: HP-UX Systems > Chapter 4 Optimizing
HP C Programs Level 2 Optimization Modules |
|
Level 2 performs optimizations within each procedure. At level 2, the optimizer performs all optimizations performed at the prior level, with the following additions:
The examples in this section are shown at the source code level wherever possible. Transformations that cannot be shown at the source level are shown in assembly language. The name of this optimization comes from the similarity to map coloring algorithms in graph theory. This optimization determines when and how long commonly used variables and expressions occupy a register. It minimizes the number of references to memory (loads and stores) a code segment makes. This can improve run-time speed. You can help the optimizer understand when certain variables are heavily used within a function by declaring these variables with the register qualifier. The first 10 register qualified variables encountered in the source are honored. You should pick the ten most important variables to be most effective. The coloring register allocator may override your choices and promote to a register a variable not declared register over one that is, based on estimated speed improvements. The following code shows the type of optimization the coloring register allocation module performs. The code: LDI 2,r104 becomes: LDI 2,r25 The induction variables and strength reduction module removes expressions that are linear functions of a loop counter and replaces each of them with a variable that contains the value of the function. Variables of the same linear function are computed only once. This module also simplifies the function by replacing multiplication instructions with addition instructions wherever possible. For example, the code: for (i=0; i<25; i++) { becomes: t1 = 0; The common subexpression elimination module identifies expressions that appear more than once and have the same result, computes the result, and substitutes the result for each occurrence of the expression. The types of subexpression include instructions that load values from memory, as well as arithmetic evaluation. For example, the code: a = x + y + z; becomes: t1 = x + y; Constant folding computes the value of a constant expression at compile time. For example: A = 10; can be replaced by: A = 10; The loop invariant code motion module recognizes instructions inside a loop whose results do not change and moves them outside the loop. This ensures that the invariant code is only executed once. For example, the code: x = z; becomes: x = z; Where possible, the store/copy optimization module substitutes registers for memory locations, by replacing store instructions with copy instructions and deleting load instructions. For example, the following HP C code: a = x + 23; where a is a local variable. return a; produces the following code for the unoptimized case: LDO 23(r26),r1 and this code for the optimized case: LDO 23(r26),ret0 The unused definition elimination module removes unused memory location and register definitions. These definitions are often a result of transformations made by other optimization modules. For example, the function: f(int x) becomes: f(int x) Software pipelining is a code transformation that optimizes program loops. It rearranges the order in which instructions are executed in a loop. It generates code that overlaps operations from different loop iterations. Software pipelining is useful for loops that contain arithmetic operations on floats and doubles. The goal of this optimization is to avoid CPU stalls due to memory or hardware pipeline latencies. The software pipelining transformation adds code before and after the loop to achieve a high degree of optimization within the loop. The following pseudo-code fragment shows a loop before and after the software pipelining optimization. Four significant things happen:
The following is a C for loop: #define SIZ 10000 When this loop is compiled with software pipelining, the optimization can be expressed in pseudo-code as follows: R1 = 0; R2 = 4.0; R3 = Y[0]; R4 = X[0]; R5 = R4 / R3; do { R6 = R1; R1++; R7 = X[R1]; R8 = Y[R1]; R9 = R5 + R2; R10 = R7 / R8; X[R6] = R9; R6 = R1; R1++; R4 = X[R1]; R3 = Y[R1]; R11 = R10 + R2; R5 = R4 / R3; X[R6] = R11 } while (R1 <= 100); R9 = R5 + R2; X[R6] = R9; This transformation stores intermediate results of the division instructions in unique registers (noted as n and m). These registers are not referenced until several instructions after the division operations. This decreases the possibility that the long latency period of the division instructions will stall the instruction pipeline and cause processing delays. Software pipelining is attempted on a loop that meets the following criteria:
This optimization produces slightly larger program files and increases compile time. It is most beneficial in programs containing loops that are executed a large number of times. This optimization is not recommended for loops that are executed only a small number of times. Use the +Onopipeline option with the +O2, +O3, or +O4 option to suppress software pipelining if program size is more important than execution speed. This will perform level two optimization, but disable software pipelining. Array references often require one or more instructions to compute the virtual memory address of the array element specified by the subscript expression. The register reassociation optimization implemented in the PA-RISC compilers tries to reduce the cost of computing the virtual memory address expression for array references found in loops. Within loops, the virtual memory address expression can be rearranged and separated into a loop varying term and a loop invariant term. Loop varying terms are those items whose values may change from one iteration of the loop to another. Loop invariant terms are those items whose values are constant throughout all iterations of the loop. The loop varying term corresponds to the difference in the virtual memory address associated with a particular array reference from one iteration of the loop to the next. The register reassociation optimization dedicates a register to track the value of the virtual memory address expression for one or more array references in a loop and updates the register appropriately in each iteration of a loop. The register is initialized outside the loop to the loop invariant portion of the virtual memory address expression and the register is incremented or decremented within the loop by the loop variant portion of the virtual memory address expression. On PA-RISC, the update of such a dedicated register can often be performed for free using the base-register modification capability of load and store instructions. The net result is that array references in loops are converted into equivalent but more efficient pointer dereferences. For example: int a[10][20][30]; after register reassociation is applied to the innermost loop becomes: int a[10][20][30]; In the above example, the compiler-generated temporary register variable, p, strides through the array a in the innermost loop. This register pointer variable is initialized outside the innermost loop and auto-incremented within the innermost loop as a side-effect of the pointer dereference. Register reassociation can often enable another loop optimization. After performing the register reassociation optimization, the loop variable may be needed only to control the iteration count of the loop. If this is case, the original loop variable can be eliminated altogether by using the PA-RISC ADDIB and ADDB machine instructions to control the loop iteration count. |
|||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||