Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8230199

consolidate signature parsing code in HotSpot sources

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Fixed
    • P3
    • 15
    • 14
    • hotspot
    • b10

    Description

      In the HotSpot source base there are many duplicate blocks of code that parse field or method signatures. In order to make this more maintainable, duplicates should be removed.

      The SignatureStream class is adequate to all parsing of field and method signatures. Code that parses signatures should be redirected to use SignatureStream.

      Code that verifies signatures (during class loading) should also be redirected
      to a variant VerifyingSignatureStream that shares common code with SignatureStream.

      Draft change: http://cr.openjdk.java.net/~jrose/jvm/consolidate-8230199
      This draft excludes class file verification checks.
      Valhalla draft: http://cr.openjdk.java.net/~jrose/jvm/valhalla-8230199
      The class file parser's verification code is also refactored atop the first draft.
      Verification refactor. http://cr.openjdk.java.net/~jrose/jvm/verifier-8230199/

      The following locations in the HotSpot C++ code are sensitive to the presence of the `Lx;` syntax for field descriptors:

       - ciEnv::get_klass_by_name_impl
       - ciObjArrayKlass::construct_array_name
       - ClassFileParser::verify_unqualified_name
       - ClassFileParser::skip_over_field_signature
       - lookupType (C2V_VMENTRY)
       - MethodHandles::print_as_basic_type_signature_on
       - FieldType::is_valid_array_signature
       - FieldType::is_obj
       - SharedRuntime::find_callee_arguments
       - SignatureIterator::parse_type
       - SignatureIterator::check_signature_end
       - SignatureStream (various)
       - SignatureVerifier::is_valid_type

      These all make reference to concrete characters `'L'` and `';'`. This makes them difficult to maintain in various ways. Optimizing the scan for terminating semicolons, or extending the syntax to support value types or templates, is likely to require an adjustment to all of those locations.

      This task refactors processing of JVM descriptor syntax to be confined to one pair of source files, signature.hpp/cpp. All other queries go through the classes Signature and SignatureStream (and related utilities in the same file).

      This task should be carried out in the main source base, and forward-ported to the Valhalla repository, so that `'Q'` types and eventually templates can be added cleanly, without undue patch friction.

      Advantages of consolidating descriptor parsing code:

       - It's easier to determine which code is responsible for descriptor parsing (symbolic names instead of hard-coded character constants).
       - The code can be optimized and all clients will benefit. This helps JVM startup.
       - Shorter code blocks lead to fewer bugs.
       - Extensions (for Valhalla) are much cheaper and more reliable.

      Key refactorings:

       - All characters significant in the JVM descriptor syntax are given symbolic names.
       - The BasicType enumeration is documented more clearly and given a stronger API.
       - New class Signature has basic tests of all sorts on Symbol operands.
       - The concept of an "envelope" handles recognition and stripping of "x" from "Lx;"
       - The SignatureStream class is used for all non-trivial parsing.
       - Uses of temporary symbols are tracked more carefully. (Probable leaks were found.)

      In the verification changes (atop the previous changes), the key refactorings are:

      - SignatureStream is factored into a RawSignatureStream which is a template.
      - String scanning control block is rebased onto `char*` pointers rather than `int` indexes.
      - The two template instances are SignatureStream and VerifyingSignatureStream.
      - Standard derived signature attributes are carried via new SignatureSummary class.
      - Ad hoc code to verify signatures and names in ClassFileParser is removed.
      - Character scanning loops cover both pre-validation and post-validation parsing (and go faster).
      - Error reporting marks the location of signature scan failure in the offending string.
      - ClassFileParser code now makes *one pass* over each signature (instead of several).
      - Signature Fingerprinter is now *eager* instead of *lazy* (part of the single pass).

      Perhaps surprisingly, the JVM's Symbol type is present in >95% of all parsing cases in HotSpot, so the augmented and consolidated Signature and SignatureStream APIs are still based on Symbol inputs. This is an consequence of the care with which HotSpot's Symbol API has been maintained.

      Centralizing the signature parsing code makes it more profitable to optimize the heart of it, which maps characters like "B" to type codes like T_BYTE. An experimental ALU-only classifier is included in the patch, based on a heuristically derived perfect hash code. The "work is shown" in 130 lines of conditionally compiled logic which derives the perfect hash expression, in case the signature syntax must change.

      Since 64-bit fingerprints are eagerly computed at class load time, it would be reasonable to consolidate that logic with the AdapterFingerPrint class, which is also heavily used during class loading, and contains almost the same information. A method's 64-bit fingerprint could be a union of either a 63-bit bit mask of immediate signature information, or else a pointer to a longer structure of the same information.

      The Java sources would probably benefit from a similar cleanup pass, based on String and/or CharSequence and/or `char[]/offset` pairs.

      Attachments

        Issue Links

          Activity

            People

              lfoltan Lois Foltan
              jrose John Rose
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: