Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8021600

Clarify lexical translation of >> and >>> w.r.t. generics

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 8
    • None
    • specification
    • None
    • Verified

      JLS 3.2 describes how a raw Unicode character stream is translated into a sequence of tokens. It states: "The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would."

      This has not been true since the introduction of generics. In fact, JSR 14 recognized the problem:

      "Consecutive type parameter brackets < and > do not need to be separated by white-space. This leads to a problem in that the lexical analyzer will map the two consecutive closing angle brackets in a type such as Vector<Seq<String>> to the right-shift symbol >>. Similarly, three consecutive closing angle brackets would be recognized as a unary right-shift symbol >>>. To make up for this irregularity, we refine the grammar for types and type parameters as follows."

      The grammar which followed in JSR 14 modified ReferenceType and ClassOrInterfaceType w.r.t. JLS2, and added TypeParameters - all of which appeared in JLS3. However, the JSR 14 grammar for handling consecutive type argument brackets in parameterized types, and consecutive type parameter brackets in generic types, did not appear in JLS3.

      I believe this was the right decision, because the JSR 14 grammar was rather obscure. I reproduce it below for reference. Note that it relies on ReferenceTypeList and TypeParameterList which were defined elsewhere in JSR 14 - they appeared in JLS3, but in obscure locations (8.8.7.1 and 8.1.2) rather than chapter 4, which leads me to think that the grammar was carefully picked apart and the handling of >> / >>> was left on the cutting room floor.

      Since compilers have been doing "the right thing" since 2004, the best option for JLS8 is to decree that compilers keep doing it. The contexts where a type is used are precisely defined by JSR 308, so 3.2 can rely those contexts when decreeing what "the right thing" is:

      *****
      The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would. There is one exception: if lexical translation occurs in a type context (4.11) and the input stream has two or more consecutive > characters that are followed by a non-> character, then each > character must be translated to the token for the numerical comparison operator >.

      N.B. Without this rule, two consecutive > brackets in a type such as List<List<String>> would be mapped to the token for the signed right shift operator >>, while three consecutive > brackets in a type such as List<List<List<String>>> would be mapped to the token for the unsigned right shift operator >>>. Worse, the mapping of four or more consecutive > brackets in a type such as List<List<List<List<String>>>> would be ambiguous, as various combinations of >, >>, and >>> tokens could represent the >>>> characters.
      *****

      -----
      Grammar from JSR 14 2.3 "Handling Consecutive Type Parameter Brackets"

      ReferenceType:
        ClassOrInterfaceType
        ArrayType
        TypeVariable

      ClassOrInterfaceType:
        Name
        Name < ReferenceTypeList1

      ReferenceTypeList1:
        ReferenceType1
        ReferenceTypeList , ReferenceType1

      ReferenceType1:
        ReferenceType >
        Name < ReferenceTypeList2

      ReferenceTypeList2:
        ReferenceType2
        ReferenceTypeList , ReferenceType2

      ReferenceType2:
        ReferenceType >>
        Name < ReferenceTypeList3

      ReferenceTypeList3:
        ReferenceType3
        ReferenceTypeList , ReferenceType3

      ReferenceType3:
        ReferenceType >>>

      TypeParameters:
        < TypeParameterList1

      TypeParameterList1:
        TypeParameter1
        TypeParameterList , TypeParameter1

      TypeParameter1:
        TypeParameter >
        TypeVariable extends ReferenceType2
        TypeVariable implements ReferenceType2
      -----

            abuckley Alex Buckley
            abuckley Alex Buckley
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: