| United States-English |
|
|
|
![]() |
HP-UX Floating-Point Guide: HP 9000 Computers > Chapter 3 Factors that Affect the Results of
Floating-Point ComputationsFloating-Point Coding Practices that Affect Application Results |
|
The most common types of floating-point "bugs" reported to Hewlett-Packard are not bugs at all, but rather a class of programming mistakes. These types of mistakes usually stem from one of the following invalid assumptions:
Sometimes the source of an erroneous result is related to a particular optimization generated by the compiler. This kind of problem can be particularly hard to solve, because it may disappear when you compile the program with a debugging option in order to debug it. See “Compiler Behavior and Compiler Version” for a discussion of the effects of compiler optimization on floating-point results. The following sections describe common floating-point programming mistakes that can lead to incorrect application results:
The first mistake produces results that are simply wrong. The others are more insidious: they usually cause the result of an operation to lose a substantial amount of precision.
Think very carefully before you test two floating-point values for equality. Because of the inherent inexactness of floating-point representations and because of the many sources of rounding inaccuracies in a floating-point computation, values that should be equal from a purely algebraic perspective in fact rarely will be. For example, on many computers 1.2 - 0.1 is not exactly equal to 1.1. Consider the following example. Example 3-4 Sample Program: fpeq.c
From an algebraic viewpoint, this routine should print that 1.2 - 0.1 equals 1.1. In fact, though, when executed on a Series 700 machine, the output is
This anomaly occurs because the numbers 0.1 and 1.1 cannot be represented exactly in IEEE-754 double-precision format. Both values, 1.1 and (1.2 - 0.1), are very close to the real number 1.1, but neither is exact. A better technique in most circumstances is to test that two values lie within a relative proximity to each other. For example, the Fortran test
could be replaced by
where EPSILON is a sufficiently small floating-point value. This test establishes whether X is within +/-EPSILON of Y. Consider again the example, rounderr.f, in “How Basic Operations Affect Application Results”. A better way to code that example is to test whether the two values are sufficiently close to each other, rather than exactly the same: Example 3-5 Sample Program: roundeps.f
You must choose a value for the error bound EPSILON that is appropriate to the magnitudes of the values being computed. If computations are yielding results with magnitudes around m, then for single-precision computations, a reasonable value for EPSILON might be m/1.0e5; for double-precision computations, a reasonable value might be m/1.0e14; for quad-precision computations, a reasonable value would be m/1.0e26. Choosing an appropriate value for the error bound, however, requires a thorough knowledge of the mathematical nature of your application. For example, a value of 1.0e-5 for EPSILON yields the following result, which may or may not be acceptable in your application:
The actual difference between the values is one bit (in hexadecimal, 41CB47AF versus 41CB47AE). Calculations can lose precision when a program attempts to take the difference between two values that are similar in magnitude and also have some degree of inaccuracy to begin with. If the operands have M bits of insignificance, and the fractions are the same for the first N significant bits, then the difference between these two values will have M+N bits of insignificance because of the cancellation of significant bits during the subtraction. For example, suppose that a single-precision application performs a computation using two different algorithms and then takes the difference between them to check the similarity of the results. Assume that the true result should be 1.0 and that both actual results have up to four bits of insignificance. If the actual results are
the magnitude of the difference is
However, the two values were the same for the first eighteen bits of their fractions, which were canceled during the subtraction. This leaves six bits in the difference, four of which are insignificant. So only two bits are significant, and the remaining 22 bits are insignificant. (Although the fraction field has only 23 bits, we include the fraction implicit bit in our count.) Figure 3-1 “Taking the Difference of Similar Values” shows how this problem commonly manifests itself in practice. Suppose you have two very large values:
If you subtract B from A, the result is
which is normalized to 1.202E44. Suppose there are already 4 digits of insignificance in A and B. Normalizing the result from 0.0000001202E50 to 1.202E44 adds another 6 digits of insignificance. So the result has 10 digits of insignificance in all, though the operands had only 4. The modulo operation (mod(x, y) in Fortran) is an instance of this type of problem when x is much greater than y; remember that the modulo formula is
(See “The Remainder Operation” for details.) Trigonometric and transcendental functions use an enhanced version of mod(x, pi/2) during argument reduction. Therefore, although HP-UX math libraries perform extremely careful and accurate argument reduction, trigonometric functions like cos(x) can lose significance when x is large. When the system adds two floating-point values, it equalizes the operands' exponents before performing the calculation. It does so by right-shifting the smaller value so as to give it the same exponent as the larger. If the two values are very different in magnitude, this right shift causes a major loss of precision in the smaller value, as Figure 3-2 “Adding Values with Very Different Magnitudes” illustrates. In fact, if the difference between the exponents of two single-precision values is 25 or greater, the smaller value is right-shifted out of existence, and adding the two values results in no change at all in the larger value. For example, consider a criminally minded American bank employee who arranges to have a penny transferred to his or her own account every so often. If the bank's computer uses single-precision floating-point arithmetic, and if the account starts with a balance of zero, then after almost twenty-three million transfers, when the bank account reaches $262,144.00, it will suddenly stop growing. Well before then, the employee may notice irregularities due to rounding errors. After 5 million transfers, the account will have more than 5 million pennies in it; but after 15 million transfers, it will have fewer than 14 million pennies. The larcenous employee may become frustrated with the bank's computer system, but it is behaving perfectly properly. (This irregularity is the reason why bank computers use COBOL's fixed-point arithmetic, not floating-point.) The following example makes the same point more simply. It accumulates a sum by adding 0.01 to the starting value a thousand times, in single-precision, and then subtracting it. Example 3-6 Sample Program: diffmag1.f
The result is not exact. Instead of 10.01 and 0.01, you are likely to get results similar to the following:
The following example is even simpler. At higher magnitudes, adding 1 to a value has no effect at all. Example 3-7 Sample Program: diffmag2.f
The addition of 1 stops affecting the result when the value reaches about 108:
One way to minimize this kind of precision loss, if you can tolerate the added execution time, is to sort the elements of an array in ascending order before you add them together. An underflow may occur when a calculation produces a result that is smaller in magnitude than the smallest normalized value. Sometimes a calculation can lose virtually all significance when it underflows, yielding a denormalized value that is then used in subsequent operations. Suppose that the minimum decimal exponent of a system is -5, and that A is assigned the value 1.23456789E-3. Then the expression A/1.0E8 would underflow, as Figure 3-3 “Unintentional Underflow” shows. Six significant digits are lost during the division, and they cannot be recovered by scaling the quotient back into the normalized range. As another example, suppose the familiar form of the Pythagorean theorem is coded as
Suppose further that this expression is executed using the following values for the single-precision variables X and Y:
These values, when squared, produce the values 9.1834095E-41 and 6.5300508E-42 respectively, whose single-precision representations in hexadecimal are
Both of these values have only 16 bits of significance. The final result, 9.9178697E-21, is a reasonable-looking normalized number. However, because it is produced by a calculation that once lost all but 16 bits of significance, it can have at most 16 bits of significance itself. In fact, it actually has considerably less. You can find out whether your application has underflowed by using the fetestexcept function. Alternatively, you can use the fesettrapenable function or the +FP compiler option to run the application with the underflow exception trap enabled. See “Exception Bits” and “Command-Line Mode Control: The +FP Compiler Option” for more information. If you have a single-precision application that underflows frequently, you can solve the problem by changing to double-precision. If a double-precision application underflows frequently, you could migrate it to quad-precision, but at a considerable loss of efficiency; you may want to restructure your application instead so as to avoid underflows, if possible. The floor of a value is the greatest whole number less than the value. The ceiling of a value is the smallest whole number greater than the value. Rounding, precision mode problems, and compiler optimizations can all contribute to inaccurate results, but under most circumstances the inaccuracy is very small and not noticeable. However, some operations can magnify the inaccuracy of a calculation to the point where the result is meaningless. This can occur in algorithms that truncate a floating-point value to make it an integer. Truncating a positive floating-point value, for instance, reduces its magnitude to the floor integer, regardless of how close to the ceiling value it may be. An expression may yield 1.999999 on one system and 2.00001 on another. Both results may be acceptable in terms of expected rounding errors. However, if the result is truncated to an integer, these two systems will produce 1 and 2, respectively, which can be an unacceptable difference. The following program provides a simple example of this situation. Example 3-8 Sample Program: trunc.c
No matter how close the value of x gets to 2.0, C conversion rules require the fractional part to be truncated. Therefore, the output of the program is as follows:
Many algorithms legitimately require truncation of results to integral values. One way to avoid the kind of problem illustrated by the preceding example is to add 0.5 to the result of a floating-point value that must be assigned to an integer:
Doing this will effectively round the result to the nearest integer (if x is greater than or equal to 0). Another solution is to call the function rint, which rounds a double value to the nearest integer:
If you use either of these solutions, the program output is
If relatively small changes to the input of a program or to the intermediate results generated by a program cause relatively large changes in the final output, the program is said to be ill-conditioned or numerically unstable. The following example illustrates an ill-conditioned program: Example 3-9 Sample Program: sloppy_tangent.f
The output from this program shows how small changes in the argument to the TAN function lead to wildly varying results:
Ill-conditioned computations cause trouble not because their input values may change, but because the seemingly innocuous rounding errors and loss of significance can have large effects on the final results. One way to establish that rounding errors and loss of significance are causing a program to produce incorrect results is to run the program in various rounding modes. (See “Rounding Mode: fegetround and fesetround” for information on how to change rounding modes using the fesetround function.) However, this technique does not always work. In the preceding example, for instance, changing the rounding mode has little effect on the results. However, if you do observe that simply changing the rounding mode causes large changes in the application results, then your application is most likely ill-conditioned. A better technique, which does not require you to use additional functions or even to modify your code, is to make very small changes in the input data and to observe the amount by which the result changes. Wild swings in output magnitude may indicate an ill-conditioned application. Fixing an ill-conditioned application requires a thorough understanding of the computations executed by the application. Chances are that the instability is caused by very few intermediate computations—possibly by just one. You must try to identify the location of the instability using your knowledge of the application, of the characteristics of the math functions called, and of the data being processed. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||