Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8202525

java.util.jar.Manifest produces MANIFEST.MF files that are not UTF-8

XMLWordPrintable

      ADDITIONAL SYSTEM INFORMATION :
      Seems to be OS and JDK independent.

      My configuration:
      openjdk version "1.8.0_162"
      OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.17.10.2-b12)
      OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)


      A DESCRIPTION OF THE PROBLEM :
      The class java.util.jar.Manifest splits overlong lines after 72 bytes when serialized to a MANIFEST.MF file.

      This conforms to the JAR File specification (https://docs.oracle.com/javase/9/docs/specs/jar/jar.html#notes_on_manifest_and_signature_filesnotes-on-manifest-and-signature-files), but rather the lines should be split somewhere before reaching 72 bytes to avoid cutting characters in two parts.

      The class should rather split overlong lines earlier, avoiding to split in the middle of a multibyte character.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Create an instance of the class `java.util.jar.Manifest` that contains an attribute that spans over multiple lines in a MANIFEST.MF file with characters that need multiple bytes in UTF-8.

      Now serialize this object to a MANIFEST.MF file. If a multibyte character starts exactly at the 72nd byte of a line, it will be cut in the resulting MANIFEST.

      I created a demo project demonstrating this issue: https://github.com/floscher/gradle-manifest-multibyte-demo

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The MANIFEST.MF file written by java.util.jar.Manifest is a valid UTF-8 text file.
      ACTUAL -
      The MANIFEST.MF file contains invalid UTF-8 characters.

      ---------- BEGIN SOURCE ----------
      import java.io.ByteArrayOutputStream;
      import java.io.IOException;
      import java.nio.ByteBuffer;
      import java.nio.charset.CodingErrorAction;
      import java.nio.charset.StandardCharsets;
      import java.util.jar.Attributes;
      import java.util.jar.Manifest;

      public class Main {
        public static void main(String... args) throws IOException {
          // Create empty manifest
          Manifest m = new Manifest();
          m.getMainAttributes().put(Attributes.Name.MANIFEST_VERSION, "1.0");
          // Add multibyte characters (here it's a 2-byte German umlaut)
          m.getMainAttributes().put(new Attributes.Name("Attribute"), "äääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääää");
          final ByteArrayOutputStream baos = new ByteArrayOutputStream();
          // Write the manifest to a byte array
          m.write(baos);
          // Print the serialized manifest
          System.out.println(new String(baos.toByteArray(), StandardCharsets.UTF_8));
          // Try to decode the written bytes to a UTF-8 string (this fails!)
          StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(baos.toByteArray()));
        }
      }

      ---------- END SOURCE ----------

      FREQUENCY : always


            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: