-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
22
-
None
-
generic
-
generic
Hello. I am searching for the possibility to improve the performance of the client code in JDK which activity uses the native methods to do some image/color/etc manipulation.
Since the "panama" is not in a preview state now, I tried to reimplement one small part of ColorConvertOp via "foreign" API.
The code I tried to replace is simple:
https://github.com/mrserb/panama-foreign/blob/foreign-memaccess%2Babi/src/java.desktop/share/native/liblcms/LCMS.c#L506
We get two arrays: srcData and dstData + a few parameters and pass it to the "cmsDoTransformLineStride" which is part of the 3p "little cms" library.
And this is how I implemented it via "foreign" API.
https://github.com/mrserb/panama-foreign/commit/293dba9c99329022c75b52981c55f1d8d1a3fd6c#diff-7448ea8346eb08c6ec1a518e3c3399d7d078a514c07956ee7e8e2b5be3d082e1R173
1. Initially the new implementation was slower in the single-threaded mode by 19%, but the usage of "SegmentAllocator.slicingAllocator" bring + 60% improvements over jni
2. Unfortunately the multi-threaded mode is unexpectedly slow: -76% 1,305 vs 5,374 ns. The root cause is the using "NIO_ACCESS.reserveMemory/NIO_ACCESS.unreserveMemory". The code spent 50% of the time on the internal Atomic.compareAndswap method. Another issue is "SKIP_ZERO_MEMORY" where we spend a bunch of time setting memory to zero right before we copy data from the heap. I guess we need some kind of atomic clone operation of one segment over another to prevent intermediate zeroing.
Because of this, the new approach loses to the GetByteArrayElements/GetPrimitiveArrayCritical.
3. Another issue is a "cold start" which is slower as well, which is expected but probably can be improved: -85% 9,108,733 vs 60,259,096 ns.
jmh test will be attached.
Reports:
Base VS current implementation via "foreign" API.
https://jmh.morethan.io/?gists=ad56ba2aa4230a7b89d11fb4c20abafe,94c9f08217d0b17c0059b923670ee075
Base VS implementation "foreign" API w/o reserver and zero:
https://jmh.morethan.io/?gists=ad56ba2aa4230a7b89d11fb4c20abafe,4c49f9ce74d1287086756929937e320a
Since the "panama" is not in a preview state now, I tried to reimplement one small part of ColorConvertOp via "foreign" API.
The code I tried to replace is simple:
https://github.com/mrserb/panama-foreign/blob/foreign-memaccess%2Babi/src/java.desktop/share/native/liblcms/LCMS.c#L506
We get two arrays: srcData and dstData + a few parameters and pass it to the "cmsDoTransformLineStride" which is part of the 3p "little cms" library.
And this is how I implemented it via "foreign" API.
https://github.com/mrserb/panama-foreign/commit/293dba9c99329022c75b52981c55f1d8d1a3fd6c#diff-7448ea8346eb08c6ec1a518e3c3399d7d078a514c07956ee7e8e2b5be3d082e1R173
1. Initially the new implementation was slower in the single-threaded mode by 19%, but the usage of "SegmentAllocator.slicingAllocator" bring + 60% improvements over jni
2. Unfortunately the multi-threaded mode is unexpectedly slow: -76% 1,305 vs 5,374 ns. The root cause is the using "NIO_ACCESS.reserveMemory/NIO_ACCESS.unreserveMemory". The code spent 50% of the time on the internal Atomic.compareAndswap method. Another issue is "SKIP_ZERO_MEMORY" where we spend a bunch of time setting memory to zero right before we copy data from the heap. I guess we need some kind of atomic clone operation of one segment over another to prevent intermediate zeroing.
Because of this, the new approach loses to the GetByteArrayElements/GetPrimitiveArrayCritical.
3. Another issue is a "cold start" which is slower as well, which is expected but probably can be improved: -85% 9,108,733 vs 60,259,096 ns.
jmh test will be attached.
Reports:
Base VS current implementation via "foreign" API.
https://jmh.morethan.io/?gists=ad56ba2aa4230a7b89d11fb4c20abafe,94c9f08217d0b17c0059b923670ee075
Base VS implementation "foreign" API w/o reserver and zero:
https://jmh.morethan.io/?gists=ad56ba2aa4230a7b89d11fb4c20abafe,4c49f9ce74d1287086756929937e320a