Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8255244

HttpClient: Response headers contain incorrectly encoded Unicode characters

    XMLWordPrintable

Details

    • b25
    • Verified

    Description

      A DESCRIPTION OF THE PROBLEM :
      The contents of HTTP headers returned in an HttpResponse are URL encoded. However, Unicode characters in these response headers appear to be encoded incorrectly.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Send a request to a server which will respond with a header whose value contains a Unicode character - e.g. Header: ✓


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Returned value of header should be correct URL encoding of the character contained in the header.
      In the example, ✓ is encoded as %E2%9C%93 hence the header value should be returned as 0xE29C93.
      ACTUAL -
      Every non-ASCII character is represented by 16 bits where the first eight bits are all 1-bits. Each of these characters erroneously starts with a byte of value 0xFF (eight 1-bits).
      In the example, the header value is returned as 0xFFE2FF9CFF93.

      ---------- BEGIN SOURCE ----------
      public void testHttpClient()
          {
              try
              {
                  final HttpClient client = HttpClient.newHttpClient();
                  final HttpRequest request = HttpRequest.newBuilder(new URI(
                      "http://localhost:1234/webpage.php")) // Your webpage here
                      .build();
                  final HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
                  final Optional<String> header = response.headers().firstValue("header"); // Header containing ✓
                  assertTrue(header.isPresent());
                  assertEquals("\u00E2\u009C\u0093", header.get()); // For character ✓
                  // Actual result: \uFFE2\uFF9C\uFF93
              }
              catch (URISyntaxException | IOException | InterruptedException e)
              {
                  // do nothing
              }
          }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Every non-ASCII character stored in the string value of the header can be masked with hex value 0x00FF to get the real value.

      FREQUENCY : always


      Attachments

        1. Main.java
          1 kB
        2. webpage.php
          0.0 kB

        Issue Links

          Activity

            People

              dfuchs Daniel Fuchs
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: