| United States-English |
|
|
|
![]() |
HP-UX Floating-Point Guide: HP 9000 Computers > Chapter 2 Floating-Point Principles
and the IEEE Standard for Binary Floating-Point ArithmeticException Conditions |
|
The IEEE standard defines five exception conditions, also called exceptions:
The following sections describe the exceptions. On HP-UX systems, traps for all of these exceptions are initially disabled by default. You can enable traps for some or all of these exceptions by using the fesettrapenable function, the +T option (f77 only), the +fp_exception option (f90 only), or the +FP compiler option. For more information, see “Enabling Traps”. The standard requires that a conforming implementation provide exception flags (see “The PA-RISC Floating-Point Status Register”). If an exception occurs and a trap for the exception is not enabled, the corresponding exception flag is set.
As we explained in “Floating-Point Formats and the Limits of IEEE Representation”, all computer floating-point systems are inherently inexact because they cannot represent all values. When a computer system cannot represent a number exactly, it must choose a nearby representable value. This is called rounding, and it always produces an inexact result condition. Because most floating-point operations produce rounded (that is, inexact) results most of the time, the inexact result exception is not usually considered to be an error. The IEEE standard requires that for the basic operations, the result must always be rounded to the nearest representable value (unless the rounding mode has been changed; see “IEEE Rounding Modes”). So, while the result of dividing 1 by 3 is not precisely 1/3, it is precisely repeatable, portable, and standard. An inexact result condition is always raised along with an overflow condition, and is also raised with an underflow condition if the result is inexact. The overflow or underflow is always raised first. These are the only situations where more than one exception is raised by the same operation. Inexactness can occur when the system attempts to convert between decimal representations and binary floating-point representations. This can occur, for example, during a C language scanf or printf call. An inexact conversion from binary to decimal can occur if the format of the decimal representation does not contain enough room to represent the floating-point value—for example, if the format specification in a printf call is 7.5f, but the value being printed requires more than 5 decimal places. An inexact conversion from decimal to binary can occur because the IEEE floating-point format cannot represent all decimal values. For instance, the IEEE format cannot represent the decimal value 0.1 exactly, because 0.1 cannot be represented by a finite sum of powers of 2. For more information and examples, see “Conversions Between Binary and Decimal”. Choosing the most appropriate representable value through rounding is not always straightforward. Whether the system rounds to the lower or higher of two representable values depends upon the rounding mode (algorithm for rounding values) the user selects. The available choices include an IEEE standard default rounding mode as well as three alternate modes.
The default rounding mode is round to nearest. The four rounding modes are:
For all but specifically designed numerical analysis applications, round to nearest is the best rounding mode. In round-to-nearest mode, the nearest representable value is never more than 1/2 ULP away from the exact result being rounded, so the error introduced from one operation by rounding is never more than 1/2 ULP. For the other rounding modes, the error is less than 1 ULP. As an example of the size of a 1/2 ULP rounding error, suppose you tried to measure precisely the distance to the sun (about 93 million miles). An error of 1/2 ULP in single-precision would put your measurement off by about 2.5 miles; an error of 1/2 ULP in double-precision would put it off by about 8 microns; and an error of 1/2 ULP in quad-precision would put it off by about 10-17 microns, which is about one millionth of the diameter of a proton. You can modify the rounding mode on HP 9000 systems by using library routines in the fenv suite of routines (see Chapter 5 “Manipulating the Floating-Point Status Register”). An overflow condition occurs whenever a floating-point operation attempts to produce a result whose magnitude is greater than the maximum representable value. Table 2-10 “Approximate Maximum Representable Floating-Point Values” shows approximate maximum values for floating-point numbers on HP 9000 systems. For example, an attempt to represent the value 10400 would produce an overflow condition in single-precision and double-precision, but not in quad-precision.
Table 2-10 Approximate Maximum Representable Floating-Point Values
Several actions are possible when an overflow occurs, depending on whether overflow traps are enabled or disabled. (By default, traps are disabled.) If overflow traps are enabled, the system signals a floating-point exception (SIGFPE). Then, if the program provides a trap handler, the system takes whatever action is dictated by the trap handler. While the IEEE standard does not define trap handler operations, it does define what type of information should be stored in the result of an operation that overflows. If the program does not provide a trap handler, the SIGFPE exception will cause the program to terminate. If overflow traps are disabled, the result of a floating-point operation that overflows is assigned either an infinity code or the closest representable number (this will be either the largest positive value or the largest negative value). The choice of whether to use infinity or the nearest representable value depends on the rounding mode, as shown in Table 2-11 “Overflow Results Based on Rounding Mode”. If overflow traps are disabled, the system generates an inexact result condition in addition to the overflow condition. Table 2-11 Overflow Results Based on Rounding Mode
An underflow condition may occur when a floating-point operation attempts to produce a result that is smaller in magnitude than the smallest normalized value. The standard allows the vendor of the floating-point system to choose whether an underflow condition is detected before or after rounding. On HP 9000 systems, underflow conditions always occur before rounding. Consequently, an operation that underflows can produce a minimum-magnitude normalized value, a denormalized value, or zero. According to the standard, an underflow condition may be signaled only if it produces an inexact result, because it is possible that the result will be exact even though it is denormalized (for example, 2-1040). In this case, there is no reason to signal an exception. When an underflow condition does produce an inexact result, it is often difficult to determine whether the inaccuracy occurs because the value is denormalized or whether the loss of accuracy is inherent in the operation being performed. For the sake of efficiency, the standard allows the implementor to decide how to define loss of accuracy for underflow conditions. On HP 9000 systems, the definition of loss of accuracy in underflow conditions includes all inaccuracies, whether they originate from denormalization or are inherent in the operation.
An invalid operation condition occurs whenever the system attempts to perform an operation that has no numerically meaningful interpretation. The following are invalid operations (also called operation errors, operand errors, or domain errors):
Out-of-range results that occur while converting from floating-point to integer trigger invalid operation conditions, but all floating-point overflows produce overflow conditions. If an invalid operation condition occurs when invalid operation traps are disabled, the system by default returns a quiet NaN as the result of the operation. If traps are enabled, the system signals a floating-point exception and, if a trap handler is provided, takes whatever action the trap handler dictates. A division by zero condition occurs whenever the system attempts to divide a nonzero, finite value by zero. More generally, this condition occurs whenever an exact infinity is produced from finite operands. If divide-by-zero traps are disabled, the result is infinity: positive infinity if the two operands have the same sign, negative infinity if they have different signs. If traps are enabled, the system signals a floating-point exception and, if a trap handler is provided, takes whatever action the trap handler dictates. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||