Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Fortran 90, Fortran 77, C, aC++: Exemplar Programming Guide > Chapter 8 Programming conventions for optimal code

Compiler assumptions

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

Compiler assumptions can produce faulty optimized code when the source code contains:

  • Iterations by zero

  • Trip counts that may overflow at optimization levels +O2 and above

Descriptions of, and methods for, avoiding the items listed above are in the following sections.

Incrementing by zero

The compiler assumes that whenever a variable is being incremented on each iteration of a loop, the variable is being incremented by a loop-invariant amount other than zero. If the compiler parallelizes a loop that increments a variable by zero on each trip, the loop can produce incorrect answers or cause the program to abort. This error can occur when a variable used as an incrementation value is accidentally set to zero. If the compiler detects that the variable has been set to zero, the compiler does not parallelize the loop. If the compiler cannot detect the assignment, however, the symptoms described below occur.

The following Fortran example shows two loops that increment by zero:

CALL SUB1(0)
.
.
.
SUBROUTINE SUB1(IZR)
DIMENSION A(100), B(100), C(100)
J = 1
DO I = 1, 100, IZR ! INCREMENT VALUE OF 0 IS
! NON-STANDARD
A(I) = B(I)
ENDDO
PRINT *, A(11)
DO I = 1, 100
J = J + IZR
B(I) = A(J)
A(J) = C(I)
ENDDO
PRINT *, A(1)
PRINT *, B(11)
END

Because IZR is an argument passed to SUB1, the compiler does not detect that IZR has been set to zero. Both loops parallelize at +O3 +Oparallel +Onodynsel.

The loops compile at +O3, but the first loop, which specifies the step as part of the DO statement (or as part of the for statement in C), attempts to parcel out loop iterations by a step of IZR. At runtime, this loop is infinite.

Due to dependences, the second loop would not behave predictably when parallelized—if it were ever reached at runtime. The compiler does not detect the dependences because it assumes J is an induction variable.

The analogous C code follows:

float a[100],b[100],c[100];

void sub1(int izr)
{
int i,j = 1;

for(i=0; i<100; i+=izr)
a[i] = b[i];
printf("%f \n", a[11]);
for(i=0; i<100;i++) {
j = j + izr;
b[i] = a[j];
a[j] = c[i];
}
printf("%f \n", a[1]);
printf("%f \n", b[11]);
}

main()
{
sub1(0);
}

Trip counts that may overflow

Some loop optimizations at +O2 and above may cause the variable on which the trip count is based to overflow. (A loop's trip count is the number of times the loop executes.) The compiler assumes that each induction variable is increasing (or decreasing) without overflow during the loop. Any overflowing induction variable may be used by the compiler as a basis for the trip count. The following sections discuss when this overflow may occur and how to avoid it.

Linear test replacement

When optimizing loops, the compiler often disregards the original induction variable, using instead a variable or value that better indicates the actual stride of the loop. A loop's stride is the value by which the iteration variable increases on each iteration. By picking the largest possible stride, the compiler reduces the execution time of the loop by reducing the number of arithmetic operations within each iteration.

The Fortran code below contains an example of a loop in which the induction variable may be replaced by the compiler.

      ICONST = 64
ITOT = 0
DO IND = 1,N
IPACK = (IND*1024)*ICONST**2
IF(IPACK .LE. (N/2)*1024*ICONST**2)
> ITOT = ITOT + IPACK
.
.
.
ENDDO
END

Executing this loop using IND as the induction variable with a stride of 1 would be extremely inefficient, so the compiler picks IPACK as the induction variable and uses the amount by which it increases on each iteration, 1024*642 or 222, as the stride.

The trip count (N in the example), or just trip, is the number of times the loop executes, and the start value is the initial value of the induction variable.

The following C function also contains an induction variable that may be replaced:

#include <math.h>

int ind, ipack, iconst, itot, n;
iconst = 64;
itot = 0;
for(ind=0; ind<n; ind++) {
ipack = (ind*1024)*pow(iconst,2);
if(ipack < (n/2)*1024*pow(iconst,2))
itot += ipack;
.
.
.
}

Here, as in the Fortran example, ipack, rather than ind, is used as the induction variable—again producing a stride of 222.

Linear test replacement, a standard optimization at levels +O2 and above, normally does not cause problems. However, when the loop stride is very large, as in the examples above, a large trip count can cause the loop limit value (start+((trip-1)*stride)) to overflow.

In the examples above, the induction variable is a 4-byte integer, which occupies 32 bits in memory. That means if start+((trip-1)*stride) (1+((N-1)*222)) is greater than 231-1, the value overflows into the sign bit and is treated as a negative number. If the stride value is negative, the absolute value of start+((trip-1)*stride) must be not exceed 231. When a loop has a positive stride and the trip count overflows, the loop stops executing when the overflow occurs because the limit becomes negative—assuming a positive stride—and the termination test fails.

Because the largest allowable value for start+((trip-1)*stride) is 231-1, the start value is 1, and the stride is 222, the maximum trip count for the loop can be found.

The stride, trip, and start values for a loop must satisfy the following inequality:

start+ ((trip - 1) * stride) 231

The start value is 1, so trip can be solved for as follows:

start+ ((trip - 1) * stride) 231

1 + (trip- 1) * 222 231

(trip- 1) * 222 231 - 1

trip- 1 29 - 2-22

trip 29 - 2-22 + 1

trip 512

The maximum value for n in the given loop, then, is 512.

If you find that certain loops give wrong answers at optimization levels +O2 or higher, the problem may be test replacement. If you still want to optimize these loops at +O2 or above, restructure them to force the compiler to choose a different induction variable.

Large trip counts at +O2 and above

When a loop is optimized at level +O2 or above , its trip count must occupy no more than a signed 32-bit storage location. The largest positive value that can fit in this space is 231 - 1 (2,147,483,647). Loops with trip counts that cannot be determined at compile time but that exceed 231 - 1 at runtime will yield wrong answers.

This limitation only applies at optimization levels +O2 and above.

A loop with a trip count that overflows 32 bits can be optimized by manually strip mining the loop.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.