-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
24
-
Tested on JDK 24 on AL2 w/ x64 and aarch64.
-
generic
-
generic
ByteArrayOutputStream is well-known to have capacity limitations due to large objects; less well-known are the performance costs of
1) the synchronized methods, and 2) array-copying every time the buffer is resized. These lead to various consequences:
- Content size of a ByteArrayOutputStream is limited by the size of the largest possible byte[]
- Single-threaded usage of a ByteArrayOutputStream is penalized by synchronization overhead (the JIT sometimes helps here)
- Writing to a ByteArrayOutputStream can allocate memory up to 2x the size of the final payloads
- Writing to a ByteArrayOutputStream generates array copies up to 2x the size of the final payload
An alternate implementation based on segments, a.k.a. smaller internal byte arrays, alleviates all four of these concerns. See attached file,
MemoryOutputStream.java . Benchmarks are also attached and a draft PR will follow.
Note that it is not possible to simply rewrite ByteArrayOutputStream using this approach, because the internals of the current implementation
are exposed by way of two protected fields: int count and byte[] buf. As an alternative, I propose a new public class that can be trivially swapped in,
similar to the relationship between StringBuffer and StringBuilder(). I recognize the challenges inherent to a new public class in java.io ... but the
benefits are so large that it's worth discussing.
I say "benefits are so large" because testing shows:
- Content size can reach 97% of the allocated heap, rather than ByteArrayOutputStream's ~2GB limit caused by a single large object
- Synchronization is removed, achieving ~2x improvement via primitive writes
- Payloads larger than the initial size are stored with half the memory allocation (bytes) and use half the memory bandwidth to copy
- Several JDK classes can benefit from adoption
The downsides of this approach are:
- The overhead and delay of publishing a new JDK class. CSR required, ETA 29?
- Larger bytecode in the write methods
- Callers must be modified to capture the gains
I welcome discussion on any point.
1) the synchronized methods, and 2) array-copying every time the buffer is resized. These lead to various consequences:
- Content size of a ByteArrayOutputStream is limited by the size of the largest possible byte[]
- Single-threaded usage of a ByteArrayOutputStream is penalized by synchronization overhead (the JIT sometimes helps here)
- Writing to a ByteArrayOutputStream can allocate memory up to 2x the size of the final payloads
- Writing to a ByteArrayOutputStream generates array copies up to 2x the size of the final payload
An alternate implementation based on segments, a.k.a. smaller internal byte arrays, alleviates all four of these concerns. See attached file,
MemoryOutputStream.java . Benchmarks are also attached and a draft PR will follow.
Note that it is not possible to simply rewrite ByteArrayOutputStream using this approach, because the internals of the current implementation
are exposed by way of two protected fields: int count and byte[] buf. As an alternative, I propose a new public class that can be trivially swapped in,
similar to the relationship between StringBuffer and StringBuilder(). I recognize the challenges inherent to a new public class in java.io ... but the
benefits are so large that it's worth discussing.
I say "benefits are so large" because testing shows:
- Content size can reach 97% of the allocated heap, rather than ByteArrayOutputStream's ~2GB limit caused by a single large object
- Synchronization is removed, achieving ~2x improvement via primitive writes
- Payloads larger than the initial size are stored with half the memory allocation (bytes) and use half the memory bandwidth to copy
- Several JDK classes can benefit from adoption
The downsides of this approach are:
- The overhead and delay of publishing a new JDK class. CSR required, ETA 29?
- Larger bytecode in the write methods
- Callers must be modified to capture the gains
I welcome discussion on any point.