-
Bug
-
Resolution: Won't Fix
-
P4
-
None
-
8, 9, 10, 11
-
x86_64
-
linux_ubuntu
ADDITIONAL SYSTEM INFORMATION :
Seems to be OS and JDK independent.
My configuration:
openjdk version "1.8.0_162"
OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.17.10.2-b12)
OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)
A DESCRIPTION OF THE PROBLEM :
The class java.util.jar.Manifest splits overlong lines after 72 bytes when serialized to a MANIFEST.MF file.
This conforms to the JAR File specification (https://docs.oracle.com/javase/9/docs/specs/jar/jar.html#notes_on_manifest_and_signature_filesnotes-on-manifest-and-signature-files), but rather the lines should be split somewhere before reaching 72 bytes to avoid cutting characters in two parts.
The class should rather split overlong lines earlier, avoiding to split in the middle of a multibyte character.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create an instance of the class `java.util.jar.Manifest` that contains an attribute that spans over multiple lines in a MANIFEST.MF file with characters that need multiple bytes in UTF-8.
Now serialize this object to a MANIFEST.MF file. If a multibyte character starts exactly at the 72nd byte of a line, it will be cut in the resulting MANIFEST.
I created a demo project demonstrating this issue: https://github.com/floscher/gradle-manifest-multibyte-demo
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The MANIFEST.MF file written by java.util.jar.Manifest is a valid UTF-8 text file.
ACTUAL -
The MANIFEST.MF file contains invalid UTF-8 characters.
---------- BEGIN SOURCE ----------
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
import java.util.jar.Attributes;
import java.util.jar.Manifest;
public class Main {
public static void main(String... args) throws IOException {
// Create empty manifest
Manifest m = new Manifest();
m.getMainAttributes().put(Attributes.Name.MANIFEST_VERSION, "1.0");
// Add multibyte characters (here it's a 2-byte German umlaut)
m.getMainAttributes().put(new Attributes.Name("Attribute"), "äääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääää");
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
// Write the manifest to a byte array
m.write(baos);
// Print the serialized manifest
System.out.println(new String(baos.toByteArray(), StandardCharsets.UTF_8));
// Try to decode the written bytes to a UTF-8 string (this fails!)
StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(baos.toByteArray()));
}
}
---------- END SOURCE ----------
FREQUENCY : always
Seems to be OS and JDK independent.
My configuration:
openjdk version "1.8.0_162"
OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.17.10.2-b12)
OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)
A DESCRIPTION OF THE PROBLEM :
The class java.util.jar.Manifest splits overlong lines after 72 bytes when serialized to a MANIFEST.MF file.
This conforms to the JAR File specification (https://docs.oracle.com/javase/9/docs/specs/jar/jar.html#notes_on_manifest_and_signature_filesnotes-on-manifest-and-signature-files), but rather the lines should be split somewhere before reaching 72 bytes to avoid cutting characters in two parts.
The class should rather split overlong lines earlier, avoiding to split in the middle of a multibyte character.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create an instance of the class `java.util.jar.Manifest` that contains an attribute that spans over multiple lines in a MANIFEST.MF file with characters that need multiple bytes in UTF-8.
Now serialize this object to a MANIFEST.MF file. If a multibyte character starts exactly at the 72nd byte of a line, it will be cut in the resulting MANIFEST.
I created a demo project demonstrating this issue: https://github.com/floscher/gradle-manifest-multibyte-demo
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The MANIFEST.MF file written by java.util.jar.Manifest is a valid UTF-8 text file.
ACTUAL -
The MANIFEST.MF file contains invalid UTF-8 characters.
---------- BEGIN SOURCE ----------
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
import java.util.jar.Attributes;
import java.util.jar.Manifest;
public class Main {
public static void main(String... args) throws IOException {
// Create empty manifest
Manifest m = new Manifest();
m.getMainAttributes().put(Attributes.Name.MANIFEST_VERSION, "1.0");
// Add multibyte characters (here it's a 2-byte German umlaut)
m.getMainAttributes().put(new Attributes.Name("Attribute"), "äääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääääää");
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
// Write the manifest to a byte array
m.write(baos);
// Print the serialized manifest
System.out.println(new String(baos.toByteArray(), StandardCharsets.UTF_8));
// Try to decode the written bytes to a UTF-8 string (this fails!)
StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(baos.toByteArray()));
}
}
---------- END SOURCE ----------
FREQUENCY : always