Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8223367

[vector] masked memory operations must correctly implement unset lanes

XMLWordPrintable

      The specification and implementation of vector memory operations are incomplete in their treatment of masked loads and stores.

      Masked memory operations must never throw ArrayIndexOutOfBounds on unset lanes.

      It is a legitimate use case to pass a negative offset `-A` into a masked array load or store, as long as the first A lanes, or all the lanes, of the mask are unset.

      It is a legitimate use case to pass an excessively positive offset `a.length+B` into a masked array load or store, as long as the last B lanes, or all the lanes, of the mask are unset.

      Such use cases arise when splitting the lanes of a single vector V across two arrays, where the first parts of V go into the end of the first array and the last parts of V go into the beginning of the second array.

      The same points apply to any other containers besides Java arrays, such as byte buffers and blocks of native memory (if applicable). The container's safety mechanisms that regulate out-of-bounds memory access must not be triggered by activity controlled by mask lanes which are unset.

      In addition, if a mask lane is unset for a memory operation and it in fact corresponds to a legitimate array element (or buffer element or other container element) a load operation must not return the stored element value, but must be filled with a default value. And a store operation through an unset mask lane must never overwrite the stored element value, not even with the original element value.

      (Rewriting a stored element value with a previously stored value is not a valid emulation of a suppressed conditional write operation. The Java Memory Model mandates that such a re-write will produce a race condition, where no race condition is created if no write is performed. If a racing thread is also writing the same location, the erroneous simulation of the suppressed write operation can make it appear as if the other thread's write had been suppressed, depending on the order of access.)

      Suggested fixes:

      * For hardware which natively supports masked reads or writes, the solution is just to use them correctly.

      * Memory reads (not gathers) can just make wild reads, under the assumption that the unset lanes in the vector are still inside a mapped page. This could fail when reading an array at the very end of a mapped memory segment, and the unset lanes happen to step off the end of the page.

      * Memory gathers can make wild reads but the unused indexes will need to be clipped to a safe value like zero.

      * Memory writes (scatters or not) cannot use a wild-store technique unless the stores are redirected to a safe "bit bucket" location private to the implementation. In order to redirect unused lanes, we need an intrinsic which can take two base addresses; the second base address points to a short bit-bucket array into which the unwanted lanes are stored. This can be a static array private to the implementation.

      The two-base intrinsic generalizes to an N-base intrinsic. The N bases should be of type ETYPE[][] (2D array). The bit-bucket base address can be added to the N bases given by the user, by extending the first dimension, or by using ad hoc logic in the intrinsic.


       

            Unassigned Unassigned
            jrose John Rose
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: