Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4511638

Double.toString(double) sometimes produces incorrect results

    XMLWordPrintable

Details

    • b26
    • generic
    • generic

    Description

      Name: bsT130419 Date: 10/05/2001


      java version "1.3.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
      Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)

      According to the description of Double.toString(double), the resulting string
      produced by this method shall be chosen so that the number of digits is as
      small as possible:

      "There must be at least one digit to represent the fractional part, and beyond
      that as many, ***but only as many***, more digits as are needed to uniquely
      distinguish the argument value from adjacent values of type double."


      The following simple program shows that for the specific case coded, this is
      not true.
      The program first shows that, internally, 1.0E23 and 9.999999999999999E22 are
      the *very same* double. In fact, as seen by the the first 2 System.out.println
      () the raw bitstrings are the same.
      However, converting this double to a string produces a result that is much
      longer as needed to recover the original value. The conversion
      produces "9.999999999999999E22" instead of the much shorter "1.0E23" which,
      internally, represents the *same* value. Because "1.0E23" is shorter
      than "9.999999999999999E22" and because when converted back to double both
      produce the same double value, "1.0E23" *must* be output according to the
      specification.


      class DoubleIO {
          public static void main(String[] args) {
      String s0 = "1.0E23";
      String s1 = "9.999999999999999E22";
      double d0 = Double.valueOf(s0).doubleValue();
      double d1 = Double.valueOf(s1).doubleValue();

      System.out.println(Double.doubleToLongBits(d0));
      System.out.println(Double.doubleToLongBits(d1));

      System.out.println(s0 + " -> " + Double.toString(d0));
          }
      }
      (Review ID: 133203)
      ======================================================================
      ###@###.### 2004-11-11 21:42:12 GMT
      Commentary on a fix for this (un-edited), from Peabody community member:


      A DESCRIPTION OF THE FIX :
      Issues with java.lang.Double.toString()
      ---------------------------------------


      Introduction
      ------------

      Although reporting bugs, this document is longer than a usual bug report
      due to some technical issues that require a deeper discussion. Moreover,
      while I implemented a "bug fix" contribution, it is really new software
      implemented from scratch, not a simple patch suite to apply to the
      current source base. I will be glad to submit it to Mustang if you
      decide that the issues described here are important enough to deserve
      more attention.

      In what follows, class names starting with a dot shall be prefixed with
      java.lang: e.g. .String stands for java.lang.String. (The initial dot
      prevents ambiguities with classes in unnamed packages.)

      The current specification of .Double.toString(double) fails to produce
      the shortest possible decimal number which can recover the original
      double. This document discusses the issue and explains why producing the
      shortest possible decimal is better, both from a theoretical as well as
      from a practical point of view.
      Moreover, several bugs in the current implementation of
      .Double.toString(double) are shown.
      Finally, a modified specification is proposed. The proposed
      specification is accompanied by an implementation which solves all the
      current bugs. In addition, it is about two times faster on the average
      than the JDK implementation.


      Definitions
      -----------

      * Exponentiation is denoted by ** to avoid confusion with Java's meaning
      of ^ for xor.
      * A decimal number is a real number of the form d*10**k, where d and k
      are integers.
      * A double is a real number whose value is in the Java double set.
      * A denormal number is a double with a Java denormalized value.
      * A normal number is any finite nonzero double which is not a denormal.

      The discussion below is limited to nonzero finite positive numbers.


      The problems with the current specification
      -------------------------------------------

      .Double.toString(double) is supposed to produce a .String denoting a
      decimal number sufficiently close to the double argument so that the
      converse transformation, as specified for example by
      .Double.parseDouble(.String), can recover the original double. Moreover,
      it is also supposed to produce a shortest decimal number which still can
      recover the double.

      .Double.toString(1e23) produces "9.999999999999999E22". In fact, this
      output recovers the original double closest to 10**23 but, evidently, it
      is not the shortest: "1.0E23" is quite shorter!
      This is due to the current specification which fixes the exponent too
      early. The premature commitment to the exponent then necessarily leads
      to the long string of nines. The best exponent should be chosen while
      producing the digits, not that early. (See also bug located at
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4511638 submittet
      years ago by me and the relevant standard literature cited there by the
      reviewer.)

      .Double.toString(5e-324) produces "4.9E-324". All of 3*10**-324,
      4*10**-324, 5*10**-324, 6*10**-324, 7*10**-324 can recover the original
      double and they are all shorter than 4.9*10**-324. The proposed
      specification would produce "5.0E-324" because, among the shortests, it
      is also the nearest to the original double, which is in fact
      approximatively 4.9*10**-324. (The trailing zero in "5.0E-324" is a
      backward compatibility concession to the current specification. In my
      opinion "5E-324" is even better: it has less characters while, at the
      same time, syntactically denoting a floating point value.)

      In some cases the specification is ambiguous. For example, it fails to
      define which of two equally short values that both happen to recover the
      original double shall be output. This underspecification could lead to
      different results on different Java implementations.


      Implementation issues
      ---------------------

      The following list shows some examples that have been detected on
      jdk1.5.0_05 and earlier releases as well as on the most recent Mustang
      (build 1.6.0-ea-b56 of 2005-10-13). Conceptually, the arrow stands for
      the double conversion decimal -> double -> decimal.

      * Spurious unnecessary trailing 0. Note that this is not a mathematical
      issue: the numbers on both sides of the arrow are the same.
      0.001 --> 0.0010
      0.002 --> 0.0020
      0.003 --> 0.0030

      * String of 9s.
      8.41E21 --> 8.409999999999999E21
      2.0E23 --> 1.9999999999999998E23
      8.962E21 --> 8.961999999999999E21

      * String of 0s.
      7.3879E20 --> 7.387900000000001E20
      3.1E22 --> 3.1000000000000002E22
      5.63E21 --> 5.630000000000001E21

      * 18 digits, even though 17 digits are *always* enough (see Matula's
      papers cited in the references)
      2.82879384806159E17 --> 2.82879384806159008E17
      1.387364135037754E18 --> 1.38736413503775411E18
      1.45800632428665E17 --> 1.45800632428664992E17

      * 5 digits too much.
      1.790086667993E18 --> 1.79008666799299994E18
      2.273317134858E18 --> 2.27331713485799987E18
      7.68905065813E17 --> 7.6890506581299994E17

      * Not the closest to the intermediate double, left number is closer.
      1.9400994884341945E25 --> 1.9400994884341944E25
      3.6131332396758635E25 --> 3.6131332396758634E25
      2.5138990223946153E25 --> 2.5138990223946152E25

      * Among the powers of 2, there are more than 17% that are output as
      unnecessarily long numbers, the worst case being 2**959:
      4.8726570057E288 --> 4.8726570056999995E288
      which is 6 digits longer than needed.

      It is quite unfortunate that some of the cases presented above can be
      produced in tons. All of them have not been found by clever analysis but
      by really simple programs. On the good news, decimal outputs that
      couldn't recover the original doubles have not been detected.


      Advocating for a better specification
      -------------------------------------

      What is wrong with the "9.999999999999999E22" output? From the
      perspective of a human user that types "1e23" and gets
      "9.999999999999999E22" as feedback, this can be annoying, to say the
      least. The system responds with a complicated number to a simple input.

      In the late sixties, Matula published some results about floating-point
      conversions (see references). One of his results states that there exist
      conversions from decimal to binary and back to decimal which always
      recover the very same decimal, provided it has no more than 15 digits
      (and provided that the intermediate double is normal). Among many other
      things, this means that 1e23 shall be output as "1.0E23".
      (Although Matula limited the discussion to truncation and one form of
      rounding conversion, the results can be extended to more general
      roundings, including IEEE 754. See
      http://homepage.sunrise.ch/mysunrise/r.giulietti/Matula.pdf for a short
      discussion.)

      By specifying that .Double.toString(double) shall produce the shortest
      decimal that can recover the original double, Matula's result is
      implicitly guaranteed.
      In other words, if .Double.toString(double) is required to produce the
      shortest decimal, then the decimal -> binary -> decimal is the identity
      (as far as the numerical value is concerned), provided the original
      decimal has no more than 15 digits and provided that the intermediate
      double is normal.
      In practical terms, this means that the vast majority of user inputs are
      guaranteed to be output with the same numerical value. All physical
      constants known to me, for example, can be input to the system and are
      guaranteed to come out with the same numerical value.

      This regularity and the "no surprise" side effects for the user, both
      based on Matula's sound results, are the driving reasons to modify the
      specification and to adopt the proposed one.
      Another advantage of the proposed specification is that it clearly
      separates the definition of the best decimal number from its
      representation as .String. Moreover, the specified decimal number is
      unique: there are sufficient rules to ensure that the conversion is
      unambiguous. This contrasts with the current specification which is
      ambiguous in the case that there is more than one "shortest" decimal.
      Such underspecifications fail to ensure identity of the decimal ->
      binary -> decimal conversion, where applicable. Moreover, they could
      lead to different results on different implementations.


      The proposed specification
      --------------------------

      Because this is longer than a usual contribution, the proposed
      specification is located at
      http://homepage.sunrise.ch/mysunrise/r.giulietti/Double.html in the form
      of a Javadoc.


      The accompanying implementation
      -------------------------------

        To solve the bugs detected in the JDK, I wrote a new implementation of
      .Double.toString(double) from scratch. Because it mirrors the proposed
      specification instead of the current one, it only makes sense to submit
      the implementation to Mustang if the spirit of the specification, if not
      its exact wording, is accepted as well.

      The implementation has the following features:
      * It solves all the problems above.
      * It is pure Java. It uses only JDK's public APIs in java.lang and
      java.math and some self written supporting classes.
      * It is threadsafe as long as the used API's are threadsafe.
      * It has been extensively tested. All outputs up to 9 digits have been
      found to be correct according to the proposed specification. Many days
      of computing have been devoted to test other billions of randomly
      generated doubles for correctness of the outputs. All boundary cases
      have been tested (powers of 10, powers of 2, Paxson's test suite,
      implementation dependent boundary cases).
      * And last but not least, it performs twice as fast on the average, and
      for long outputs is up to 4 times faster than the JDK.


      References
      ----------

      D. W. Matula, "The Base Conversion Theorem", Proc. AMS, 19 (1968) p.
      716-723
      D. W. Matula, "In-and-Out Conversions", Communications of the ACM, 11
      (1968) p. 47-50

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              rgiulietti Raffaello Giulietti
              bstrathesunw Bill Strathearn (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: