Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8011048

Possible reading from unmapped memory in UTF8::as_quoted_ascii()

XMLWordPrintable

    • b26

        I found a bug in UTF8::as_quoted_ascii() that would cause random crashes on Win32 (see "HERE" below in the code):

        void Symbol::print_symbol_on(outputStream* st) const {
          st = st ? st : tty;
          st->print("%s", as_quoted_ascii());
        }

        char* Symbol::as_quoted_ascii() const {
          const char *ptr = (const char *)&_body[0];
          int quoted_length = UTF8::quoted_ascii_length(ptr, utf8_length());
          char* result = NEW_RESOURCE_ARRAY(char, quoted_length + 1);
          UTF8::as_quoted_ascii(ptr, result, quoted_length + 1);
          return result;
        }

        // converts a utf8 string to quoted ascii
        void UTF8::as_quoted_ascii(const char* utf8_str, char* buf, int buflen) {
          const char *ptr = utf8_str;
          char* p = buf;
          char* end = buf + buflen;
          while (*ptr != '\0') { <<<<<<<<<<<<<<<<<<<<<<<<HERE
            jchar c;
            ptr = UTF8::next(ptr, &c);
            if (c >= 32 && c < 127) {
              if (p + 1 >= end) break; // string is truncated
              *p++ = (char)c;
            } else {
              if (p + 6 >= end) break; // string is truncated
              sprintf(p, "\\u%04x", c);
              p += 6;
            }
          }
          *p = '\0';
        }

        The (*ptr != '\0') check in UTF8::as_quoted_ascii assumes that it's OK to read one byte past the end of the utf8_str. This byte may not be zero, but the following (p+1 >= end) check would ensure the loop terminates.

        However, after I fixed the Symbol::size() function (see JDK-8009575), I have a case where

            Symbol x = 0x00003ff0{
                _length = 8;
                _body[] = "activate" // &_body[0] = 0x00003ff8
            };

            x.size() == 16 bytes

        The first byte after the end of "activate" is at 0x00004000, i.e., at a different page. The symbol is allocated by malloc. On Windows, the page immediately following the newly allocated space could be unmapped. So the (*ptr != '\0') line would crash with a faulting address of 0x00004000. I saw this in a JPRT crash dump.

        The fix would be to add a utf8_length parameter to UTF8::as_quoted_ascii -- similar to the existing function UNICODE::as_quoted_ascii(const jchar* base, int length, char* buf, int buflen).

        This bug could also crash the existing code (without fixed for JDK-8009575) if the first byte past the end of the string is a UTF8 lead byte, which would cause extra reading in UTF8::next(). We probably have never seen such a crash because (I am guessing)

        (1) only Windows would have unmapped space immediately following a malloc'ed block.
        (2) windows fills the malloc'ed block with zeros, so UTF8::next() reads only 1 byte.

              iklam Ioi Lam
              iklam Ioi Lam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: