 |
» |
|
|
 |
The IEEE standard requires a complying system to support the
following floating-point operations: - Addition
Algebraic
addition. - Subtraction
Algebraic
subtraction. - Multiplication
Algebraic
multiplication. - Division
Algebraic
division. - Comparison
There are four possible relations
between any two floating-point values: less than, equal, greater
than, and unordered. The unordered relation
occurs when one or both of the operands is a Not-a-Number (NaN).
See “Comparison” for
details. - Square Root
The square
root operation never overflows or underflows. - Conversion
The
following conversions must be supported by a conforming implementation,
if the implementation supports single-precision, double-precision,
and quad-precision formats: Single-precision to double-precision Single-precision to quad-precision Double-precision to single-precision Double-precision to quad-precision Quad-precision to single-precision Quad-precision to double-precision Floating-point to integer Integer to floating-point Binary floating-point to decimal Decimal to binary floating-point
See “Conversion Between Operand Formats”
for more information about these conversions. - Round to Nearest Integral Value
Rounds
an argument to the nearest integral value (in floating-point format)
based on the current rounding mode. Rounding modes are described
in “Inexact Result (Rounding)”. - Remainder
The
remainder operation takes two arguments, x
and y, and is defined as x - y * n,
where n is the integer nearest the exact
value x/y.
See “The Remainder Operation” for
more information.
To understand the properties of each operation, you need a
full understanding of denormalized numbers, infinities, and NaNs
(see “Normalized and Denormalized Values”, “Infinity”, and “Not-a-Number (NaN)”). HP 9000
systems conform to the IEEE standard for all of these operations. The standard requires that the result of each operation be
rounded from its mathematically exact value into an IEEE representation
in accordance with the rounding mode. In round-to-nearest mode (the
default), the result is within 1/2 ULP. (There is one exception
to this rule; conversions between binary and decimal need not be
rounded perfectly at the extremes of their ranges.) Comparison |  |
The comparison operation determines the truth of an assertion
about the relationship of two floating-point values. The four basic
assertions are - operand1
< operand2
The first operand is less than the second. - operand1 = operand2
The first operand is equal to the second. - operand1 > operand2
The first operand is greater than the second. - operand1 ? operand2
Unordered. This assertion
is true if either operand is a NaN.
The basic assertions can be combined with each other. For
example, "a >= b"
asserts that a is greater than or equal
to b. Similarly, "a <> b"
asserts that a is either greater than
or less than b; for operands that are
not NaNs, this assertion is the opposite of "a = b". An assertion may also be negated. The IEEE standard defines
two versions of every possible assertion: the aware
and the non-aware version. Both the aware
and non-aware versions of an assertion treat a NaN as a special
value that compares as neither less than nor greater than any numeric
value, and as unequal to any value, including any other NaN and
even itself. This definition yields the interesting fact that the
assertion "x = x"
will evaluate to FALSE if x is a NaN.
In fact, applications sometimes use this comparison operation specifically
to detect NaNs, although it is a dangerous practice because some
vendors' optimizers remove this operation from the code. The non-aware version
of an assertion behaves the same as the aware version, with the
addition that if either or both operands is a NaN, it also raises
an invalid operation exception for the <, <=,
>, and >= assertions. The =, !=, and ? assertions are the only ones
that are valid with NaN operands. Signaling NaNs cause an invalid operation exception for both
aware and non-aware assertions. The behavior of the comparison operation for each of the possible
operand kinds is as follows: - Normalized and Denormalized
Values
The operands are algebraically compared. - Zero
Zeros are greater than any nonzero negative value
and less than any nonzero positive value. The sign of a zero is
ignored, so that two zeros always compare as equal even if they
have opposite signs. - Infinity
To the comparison operators, infinity is just another
signed numeric value whose magnitude is greater than the largest
normalized magnitude. Infinities with the same sign compare as equal
to each other. - NaN
A NaN compares as unequal to
all other operands, including other NaNs and itself. The rules above
are used to evaluate assertions involving NaNs as TRUE or FALSE.
If the assertion is non-aware, an invalid operation exception is
also signaled for any comparison involving a <, <=,
>, or >= assertion.
Conversion Between Operand Formats |  |
The standard requires that it be possible to convert between
decimal and binary floating-point, and between binary floating-point
and integer formats. This section describes some of the properties
of various conversions. The operand type integer
refers to either signed or unsigned integers. - Single-Precision
to Double-Precision or Quad-Precision
These conversions can never overflow, underflow,
or be inexact. The only possible type of exception is an invalid
operation if the operand is an SNaN. - Double-Precision to Quad-Precision
These conversions can never overflow, underflow,
or be inexact. The only possible type of exception is an invalid
operation if the operand is an SNaN. - Quad-Precision or Double-Precision
to Single-Precision
These conversions can overflow or underflow and
are usually inexact. - Quad-Precision to Double-Precision
These conversions can overflow or underflow and
are usually inexact. - Decimal to Single-Precision, Double-Precision,
or Quad-Precision
These conversions can overflow or underflow and
are usually inexact. See “Conversions Between Binary and
Decimal” for more information about these conversions. - Single-Precision, Double-Precision,
or Quad-Precision to Decimal
These conversions can overflow or underflow and
are usually inexact. See “Conversions Between Binary and
Decimal” for more information about these conversions. - Single-Precision, Double-Precision,
or Quad-Precision to Integer
These conversions are usually inexact. Out-of-range
finite values, infinities, and NaNs cause an invalid operation exception.
The overflow and underflow exceptions do not apply to these conversions.
Results that are too small to round up to one round down to zero.
Signed zeros become integer zeros. HP 9000 systems round these conversions in accordance with
IEEE rounding rules. However, some programming languages, such as
C, require that these conversions be performed with truncation.
See “Truncation to an Integer Value” for
information about problems that can result when floating-point values
are truncated to integer. - Integer to Quad-Precision
These conversions are always exact and never generate
an exception. - Integer to Double-Precision or Single-Precision
These conversions are exact except for conversions
of 32-bit integer values greater than 224
- 1 to single-precision, or of 64-bit integer values greater
than 253 - 1 to double-precision,
which may generate an inexact result exception.
The Remainder Operation |  |
The remainder operation is an exact modulo function. When
y is not equal to zero, the remainder
r = remainder(x,
y)
is defined as where n is the integer nearest
the exact value x/y.
When |n - x/y| = 1/2,
n is even. If r
is zero, its sign is that of x. Two examples: The integer closest to the exact value
1.6/2.0 is 1. So the remainder of 1.6 and 2.0 is 1.6 - (2.0 * 1),
or -0.4. The integer closest to the exact value 5.0/2.0 is
2 (the exact value is halfway between 2 and 3, so n
is even). So the remainder of 5.0 and 2.0 is 5.0 - (2.0 * 2),
or 1.
The result of the remainder operation is not affected by the
rounding mode. (The result is always exact, so rounding is not a
factor.) The C math library remainder
function implements the IEEE remainder operation.
|