The
compiler applies normal arithmetic rules to real numbers. It assumes
that two arithmetically equivalent expressions produce the same
numerical result.
Most real numbers cannot be represented exactly in digital computers.
Instead, these numbers are rounded to a floating-point value that
can be represented. When optimization changes the evaluation order
of a floating-point expression, the results can change. Possible
consequences of floating-point roundoff include program aborts,
division by zero, address errors, and incorrect results.
In any parallel program, the execution order of the instructions
will differ from the serial version of the same program. This can
cause noticeable roundoff differences between the two versions.
Running a parallel code under different machine configurations or
conditions can also yield roundoff differences, because the execution
order can differ under differing machine conditions, causing roundoff
errors to propagate in different orders between executions.
Accumulator variables (reductions) are especially susceptible to
these problems.
Consider the following Fortran example
:
C$DIR GATE(ACCUM_LOCK) LK = ALLOC_GATE(ACCUM_LOCK) . . . LK = UNLOCK_GATE(ACCUM_LOCK) C$DIR BEGIN_TASKS, TASK_PRIVATE(I) CALL COMPUTE(A) C$DIR CRITICAL_SECTION(ACCUM_LOCK) ACCUM = ACCUM + A C$DIR END_CRITICAL_SECTION C$DIR NEXT_TASK DO I = 1, 10000 B(I) = FUNC(I) C$DIR CRITICAL_SECTION(ACCUM_LOCK) ACCUM = ACCUM + B(I) C$DIR END_CRITICAL_SECTION . . . ENDDO C$DIR NEXT_TASK DO I = 1, 10000 X = X + C(I) + D(I) ENDDO C$DIR CRITICAL_SECTION(ACCUM_LOCK) ACCUM = ACCUM/X C$DIR END_CRITICAL_SECTION C$DIR END_TASKS |
 |
Here, three parallel tasks are all manipulating the real variable
ACCUM, using real variables which
have themselves been manipulated. Each manipulation is subject to
roundoff error, so the total roundoff error here might be substantial.
When the program runs in serial, the tasks execute in their written
order, and the roundoff errors accumulate in that order. However,
if the tasks run in parallel, there is no guarantee as to what order
the tasks will run in, meaning the roundoff error will accumulate
in a different order than it does during the serial run. Depending
on machine conditions, the tasks may run in different orders during
different parallel runs also, potentially accumulating roundoff
errors differently and yielding different answers.
An analogous C example follows:
static gate_t accum_lock; lk = alloc_gate(&accum_lock); . . . lk = unlock_gate(&accum_lock); #pragma _CNX begin_tasks, task_private(i) compute(a); #pragma _CNX critical_section(accum_lock) accum = accum + a; #pragma _CNX end_critical_section #pragma _CNX next_task for(i=0;i<10000;i++) { b[i] = func[i]; #pragma _CNX critical_section(accum_lock) accum = accum + b[i]; #pragma _CNX end_critical_section . . . } #pragma _CNX next_task for(i=0;i<10000;i++) x = x + c[i] + d[i]; #pragma _CNX critical_section(accum_lock) accum = accum/x; #pragma _CNX end_critical_section #pragma _CNX end_tasks |
 |
Problems with
floating-point precision can also occur when a program tests the
value of a variable without allowing enough tolerance for roundoff
errors. To solve the problem, adjust the tolerances to allow for
greater
roundoff errors or declare the variables to be of a higher precision
(use the double type instead of
float in C and C++,
or REAL*8 rather than REAL*4
in Fortran). It is always poor practice to test floating point numbers
for exact equality.
Enabling sudden underflow |
 |
By default, PA-RISC processor hardware represents a floating
point number in denormalized format when the number is tiny.
A floating point number is considered tiny if its exponent field
is zero but its mantissa is nonzero (for more information, refer
to the HP-UX Floating-Point Guide). This practice
is extremely costly in terms of execution time and seldom provides
any benefit. You can enable sudden underflow (flush to zero) of
denormalized values by passing the +FPD
flag to the linker. This is done using the -W
compiler option.
The following example shows such an f90
command line:
%f90 -Wl,+FPD prog.f
This command line compiles the program prog.f
and instructs the linker to enable sudden underflow.