Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Fortran 90, Fortran 77, C, aC++: Exemplar Programming Guide > Chapter 8 Programming conventions for optimal code

Floating-point imprecision

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

The compiler applies normal arithmetic rules to real numbers. It assumes that two arithmetically equivalent expressions produce the same numerical result.

Most real numbers cannot be represented exactly in digital computers. Instead, these numbers are rounded to a floating-point value that can be represented. When optimization changes the evaluation order of a floating-point expression, the results can change. Possible consequences of floating-point roundoff include program aborts, division by zero, address errors, and incorrect results.

In any parallel program, the execution order of the instructions will differ from the serial version of the same program. This can cause noticeable roundoff differences between the two versions. Running a parallel code under different machine configurations or conditions can also yield roundoff differences, because the execution order can differ under differing machine conditions, causing roundoff errors to propagate in different orders between executions. Accumulator variables (reductions) are especially susceptible to these problems.

Consider the following Fortran example :

C$DIR GATE(ACCUM_LOCK)
LK = ALLOC_GATE(ACCUM_LOCK)
.
.
.
LK = UNLOCK_GATE(ACCUM_LOCK)
C$DIR BEGIN_TASKS, TASK_PRIVATE(I)
CALL COMPUTE(A)
C$DIR CRITICAL_SECTION(ACCUM_LOCK)
ACCUM = ACCUM + A
C$DIR END_CRITICAL_SECTION
C$DIR NEXT_TASK

DO I = 1, 10000
B(I) = FUNC(I)
C$DIR CRITICAL_SECTION(ACCUM_LOCK)
ACCUM = ACCUM + B(I)
C$DIR END_CRITICAL_SECTION
.
.
.
ENDDO

C$DIR NEXT_TASK
DO I = 1, 10000
X = X + C(I) + D(I)
ENDDO
C$DIR CRITICAL_SECTION(ACCUM_LOCK)
ACCUM = ACCUM/X
C$DIR END_CRITICAL_SECTION
C$DIR END_TASKS

Here, three parallel tasks are all manipulating the real variable ACCUM, using real variables which have themselves been manipulated. Each manipulation is subject to roundoff error, so the total roundoff error here might be substantial. When the program runs in serial, the tasks execute in their written order, and the roundoff errors accumulate in that order. However, if the tasks run in parallel, there is no guarantee as to what order the tasks will run in, meaning the roundoff error will accumulate in a different order than it does during the serial run. Depending on machine conditions, the tasks may run in different orders during different parallel runs also, potentially accumulating roundoff errors differently and yielding different answers.

An analogous C example follows:

static gate_t accum_lock;
lk = alloc_gate(&accum_lock);
.
.
.
lk = unlock_gate(&accum_lock);
#pragma _CNX begin_tasks, task_private(i)
compute(a);
#pragma _CNX critical_section(accum_lock)
accum = accum + a;
#pragma _CNX end_critical_section
#pragma _CNX next_task
for(i=0;i<10000;i++) {
b[i] = func[i];
#pragma _CNX critical_section(accum_lock)
accum = accum + b[i];
#pragma _CNX end_critical_section
.
.
.
}
#pragma _CNX next_task
for(i=0;i<10000;i++)
x = x + c[i] + d[i];
#pragma _CNX critical_section(accum_lock)
accum = accum/x;
#pragma _CNX end_critical_section
#pragma _CNX end_tasks

Problems with floating-point precision can also occur when a program tests the value of a variable without allowing enough tolerance for roundoff errors. To solve the problem, adjust the tolerances to allow for greater roundoff errors or declare the variables to be of a higher precision (use the double type instead of float in C and C++, or REAL*8 rather than REAL*4 in Fortran). It is always poor practice to test floating point numbers for exact equality.

Enabling sudden underflow

By default, PA-RISC processor hardware represents a floating point number in denormalized format when the number is tiny. A floating point number is considered tiny if its exponent field is zero but its mantissa is nonzero (for more information, refer to the HP-UX Floating-Point Guide). This practice is extremely costly in terms of execution time and seldom provides any benefit. You can enable sudden underflow (flush to zero) of denormalized values by passing the +FPD flag to the linker. This is done using the -W compiler option.

The following example shows such an f90 command line:

%f90 -Wl,+FPD prog.f

This command line compiles the program prog.f and instructs the linker to enable sudden underflow.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.