Decimal floating point
From Wikipedia, the free encyclopedia
|
This article has multiple issues. Please help improve the article or discuss these issues on the talk page.
|
Decimal floating point arithmetic refers to both a representation and operations on decimal floating point numbers. Working directly with base 10 exponents can avoid rounding errors that result in conversion to base 2 exponents.
The advantage of decimal floating-point representation over decimal fixed-point and integer representation is that it supports a much wider range of values. For example, while a fixed-point representation that allocates eight decimal digits and two decimal places can represent the numbers 123456.78, 8765.43, 123.00, and so on, a floating-point representation with eight decimal digits could also represent 1.2345678, 1234567.8, 0.000012345678, 12345678000000000, and so on.
For more details on the rationale behind DFP, see Decimal Floating-Point: Algorithm for Computers in the Proceedings of the 16th IEEE Symposium on Computer Arithmetic (Cowlishaw, M. F., 2003).
Contents |
[edit] Implementations
Early mechanical uses of decimal floating point are evident in the abacus, slide rule, the Smallwood calculator, and some other calculators that support entries in scientific notation. In the case of the mechanical calculators, the exponent is often treated as side information that is accounted for separately.
Some computer languages have implementations of decimal floating point arithmetic, including Java with big decimal, emacs with calc, python, and in Unix the bc and dc calculators.
In 1987, the IEEE released IEEE 854, a standard for computing with decimal floating point, which lacked a specification for how floating point data should be encoded for interchange with other systems. This is being addressed in IEEE 754r which is in the process of standardizing both the representations and encodings of decimal floating point data.
IBM POWER6 includes DFP in hardware, as does the IBM System z9. The IEEE 754r defines this in more detail.
Microsoft C#, or .NET, uses System.Decimal.
Open Source radix 10000 Java class library on sourceforge. In Linux, this is found in libdecnumber, gcc/dfp.c and other search terms.
[edit] Floating point arithmetic operations
The usual rule for performing floating point arithmetic is that the exact mathematical value is calculated,[1] and the result is then rounded to the nearest representable value in the specified precision. This is in fact the behavior mandated for IEEE-compliant computer hardware, under normal rounding behavior and in the absence of exceptional conditions.
For ease of presentation and understanding, 7 digit precision will be used in the examples. The fundamental principles are the same in any precision.
[edit] Addition
A simple method to add floating point numbers is to first represent them with the same exponent. In the example below, the second number is shifted right by three digits. We proceed with the usual addition method:
The following example is decimal means base is simply 10.
123456.7 = 1.234567 * 10^5
101.7654 = 1.017654 * 10^2 = 0.001017654 * 10^5 simply
Hence:
123456.7 + 101.7654 = (1.234567 * 10^5) + (1.017654 * 10^2) =
= (1.234567 * 10^5) + (0.001017654 * 10^5) =
= 10^5 * ( 1.234567 + 0.001017654 ) = 10^5 * 1.235584654. simply
This is nothing else as converting to engineering notation. In detail:
e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654) e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654)
This is the true result, the exact sum of the operands. It will be rounded to seven digits and then normalized if necessary. The final result is
e=5; s=1.235585 (final sum: 123558.5)
Note that the low 3 digits of the second operand (654) are essentially lost. This is round-off error. In extreme cases, the sum of two non-zero numbers may be equal to one of them:
e=5; s=1.234567 + e=-3; s=9.876543 e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding/normalization)
Another problem of loss of significance occurs when two close numbers are subtracted. e=5; s=1.234571 and e=5; s=1.234567 are representations of the rationals 123457.1467 and 123456.659.
e=5; s=1.234571 - e=5; s=1.234567 ---------------- e=5; s=0.000004 e=-1; s=4.000000 (after rounding/normalization)
The best representation of this difference is e=-1; s=4.877000, which differs more than 20% from e=-1; s=4.000000. In extreme cases, the final result may be zero even though an exact calculation may be several million. This cancellation illustrates the danger in assuming that all of the digits of a computed result are meaningful.
Dealing with the consequences of these errors are topics in numerical analysis.
[edit] Multiplication
To multiply, the significands are multiplied while the exponents are added, and the result is rounded and normalized.
e=3; s=4.734612 × e=5; s=5.417242 ----------------------- e=8; s=25.648538980104 (true product) e=8; s=25.64854 (after rounding) e=9; s=2.564854 (after normalization)
Division is done similarly, but that is more complicated.
There are no cancellation or absorption problems with multiplication or division, though small errors may accumulate as operations are performed repeatedly. In practice, the way these operations are carried out in digital logic can be quite complex. (see Booth's multiplication algorithm and digital division)
[edit] References
- ^ Computer hardware doesn't necessarily compute the exact value; it simply has to produce the equivalent rounded result as though it had computed the infinitely precise result.

