Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8233144

undefined behavior: signed integer overflow

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 15
    • hotspot

      This bug is specifically about signed integer overflow in the HotSpot source base. (Similar bugs may well be on file for other C or C++ source bases, or over other classes of undefined code.)

      This bug is not classified confidential because it deals with problems widely known to apply to many large C or C++ source bases such as HotSpot. Specific issues that may have confidential aspect will be filed separately.

      # Background

      C and C++ specifications decline to define the behavior of many expressions and other program elements. Although this has been true as long as there have been standards for those languages, recent standards are carefully defining many conditions under which code has undefined behavior. Also, compilers are more freely exploiting code with undefined behavior. The result is that old C and C++ code with undefined behavior no longer always does "the obvious thing", but instead may do something surprising and arbitrary, especially at high optimization levels. This trend is forcing C and C++ programmers to move their code out of the shadowy areas of the language.

      Typically, when an expression involving signed integer types is evaluated, the resulting value is defined as a mathematical function of the mathematical values of the inputs. The resulting mathematical value may well fail to be representable in the particular C or C++ result type. In such cases, the specification typically declares that the expression causes "undefined behavior". This allows the compiler (especially in aggressive optimization modes) to substitute any convenient result for the computation.

      Traditionally, compilers did something predictable on overflow, such as wrapping the result around to the result type's range, by discarding high order bits. Traditionally, many users have written their code, including parts of HotSpot, with this expectation. That expectation is no longer tenable, and the old code needs to be updated to remove the undefined behavior.

      # Remediation

      There are tools (such compiler warnings and optional runtime checks) to help detect undefined behavior. Old source code bases like HotSpot should be evaluated with such tools and undefined behavior either removed or explicitly documented as permitted (and warnings disabled).

      The workaround provided by the C and C++ specifications for coder to use is unsigned (not signed) integral types. These are explicitly specified to produce exact mathematical results in the range of the type. (In all relevant cases for our source base, this is zero to (2^N)-1, where N is the number of bits in the type.) The result value is defined in terms of a mathematical modular reduction to the type's range, also called wraparound. Conversion between signed and unsigned values is also specified without undefined behavior. This means "old style" code which relies on wraparound behavior for signed types needs to be converted to use unsigned temporaries.

      Because this may be a time-consuming job, this bug should be broken up into smaller tasks to repair undefined behavior where it has been observed. (Some tasks may need confidential handling also.) Please link such tasks to this bug.

      It is expected that some, not all, of the repairs needed will be fairly simple, once a careful decision has been made about how to fix a particular area. Fixes may involve refactoring code, or may be "point wise" transforms of expressions to use unsigned types, or may simply change compiler configuration parameters, or may be a combination of all of the above.

      This bug is tagged "starter" to make it easier to discover by programmers looking for a useful introductory task.

      This umbrella bug should not be closed until either (a) most of HotSpot has been cleared of undefined behavior due to signed integer overflow, or (b) conditions have somehow changed so that it no longer needs to be fixed.

      Possible conditions (b) include (b1) a reliable way to request "old school" signed integer wraparound from *all* compilers applied to HotSpot (not just *some*), (b2) a change in the standard which somehow renders our code "OK", (b3) translation of offending modules (such as C2) from C++ to Java (a goal of Project Metropolis).

      # References

      "C++ language / Basic Concepts / Undefined behavior"
      https://en.cppreference.com/w/cpp/language/ub
      Excerpt: "undefined behavior - there are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, modification of the same scalar more than once in an expression without sequence points, access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful."

      "Tell Programmers About Signed Integer Overflow Behavior"
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1407r0.pdf
      (useful overview of the issue written by a standard committee member)
      Excerpt: "Even though signed integer overflow is undefined in the standard, and has been for decades, signed integer overflow actually occurs in real programs. In order to deterministically generate code a compiler vendor needs to have a policy for what should happen if signed integer overflow occurs. To the best of the author’s understanding, there are three different behaviors that C++ compilers/optimizers implement today for signed integer overflow.
      ... *Can’t happen.* Starting somewhere around 2007 this model has been used by some optimizers that assume code is free of signed integer overflow, presumably due to extensive testing with the -ftrapv flag. In this mode the optimizer assumes that signed integer overflow can never happen. So if code, accidentally or intentionally, relies on signed integer overflow, that code may be elided by the optimizer. Both clang and gcc support this model today through various compiler optimizer flags."

      "Basics of Integer Overflow"
      https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Integer-Overflow-Basics.html
      "...signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer. The misbehavior can even precede the overflow. Such an overflow can occur during addition, subtraction, multiplication, division, and left shift."

      "How undefined signed overflow enables optimizations in GCC" (2016)
      https://kristerw.blogspot.com/2016/02/how-undefined-signed-overflow-enables.html
      Excerpt: "The nice property of overflow being undefined is that signed integer operations works as in normal mathematics — you can cancel out values so that (x*10)/5 simplifies to x*2, or (x+1)<(y+3) simplifies to x<(y+2). Increasing a value always makes it larger, so x<(x+1) is always true."

      "Is signed integer overflow still undefined behavior in C++?" (2013)
      https://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c
      (much discussion thereof)

            dlong Dean Long
            jrose John Rose
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: