-
Enhancement
-
Resolution: Fixed
-
P4
-
None
-
None
-
Verified
JLS 3.2 describes how a raw Unicode character stream is translated into a sequence of tokens. It states: "The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would."
This has not been true since the introduction of generics. In fact, JSR 14 recognized the problem:
"Consecutive type parameter brackets < and > do not need to be separated by white-space. This leads to a problem in that the lexical analyzer will map the two consecutive closing angle brackets in a type such as Vector<Seq<String>> to the right-shift symbol >>. Similarly, three consecutive closing angle brackets would be recognized as a unary right-shift symbol >>>. To make up for this irregularity, we refine the grammar for types and type parameters as follows."
The grammar which followed in JSR 14 modified ReferenceType and ClassOrInterfaceType w.r.t. JLS2, and added TypeParameters - all of which appeared in JLS3. However, the JSR 14 grammar for handling consecutive type argument brackets in parameterized types, and consecutive type parameter brackets in generic types, did not appear in JLS3.
I believe this was the right decision, because the JSR 14 grammar was rather obscure. I reproduce it below for reference. Note that it relies on ReferenceTypeList and TypeParameterList which were defined elsewhere in JSR 14 - they appeared in JLS3, but in obscure locations (8.8.7.1 and 8.1.2) rather than chapter 4, which leads me to think that the grammar was carefully picked apart and the handling of >> / >>> was left on the cutting room floor.
Since compilers have been doing "the right thing" since 2004, the best option for JLS8 is to decree that compilers keep doing it. The contexts where a type is used are precisely defined by JSR 308, so 3.2 can rely those contexts when decreeing what "the right thing" is:
*****
The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would. There is one exception: if lexical translation occurs in a type context (4.11) and the input stream has two or more consecutive > characters that are followed by a non-> character, then each > character must be translated to the token for the numerical comparison operator >.
N.B. Without this rule, two consecutive > brackets in a type such as List<List<String>> would be mapped to the token for the signed right shift operator >>, while three consecutive > brackets in a type such as List<List<List<String>>> would be mapped to the token for the unsigned right shift operator >>>. Worse, the mapping of four or more consecutive > brackets in a type such as List<List<List<List<String>>>> would be ambiguous, as various combinations of >, >>, and >>> tokens could represent the >>>> characters.
*****
-----
Grammar from JSR 14 2.3 "Handling Consecutive Type Parameter Brackets"
ReferenceType:
ClassOrInterfaceType
ArrayType
TypeVariable
ClassOrInterfaceType:
Name
Name < ReferenceTypeList1
ReferenceTypeList1:
ReferenceType1
ReferenceTypeList , ReferenceType1
ReferenceType1:
ReferenceType >
Name < ReferenceTypeList2
ReferenceTypeList2:
ReferenceType2
ReferenceTypeList , ReferenceType2
ReferenceType2:
ReferenceType >>
Name < ReferenceTypeList3
ReferenceTypeList3:
ReferenceType3
ReferenceTypeList , ReferenceType3
ReferenceType3:
ReferenceType >>>
TypeParameters:
< TypeParameterList1
TypeParameterList1:
TypeParameter1
TypeParameterList , TypeParameter1
TypeParameter1:
TypeParameter >
TypeVariable extends ReferenceType2
TypeVariable implements ReferenceType2
-----
This has not been true since the introduction of generics. In fact, JSR 14 recognized the problem:
"Consecutive type parameter brackets < and > do not need to be separated by white-space. This leads to a problem in that the lexical analyzer will map the two consecutive closing angle brackets in a type such as Vector<Seq<String>> to the right-shift symbol >>. Similarly, three consecutive closing angle brackets would be recognized as a unary right-shift symbol >>>. To make up for this irregularity, we refine the grammar for types and type parameters as follows."
The grammar which followed in JSR 14 modified ReferenceType and ClassOrInterfaceType w.r.t. JLS2, and added TypeParameters - all of which appeared in JLS3. However, the JSR 14 grammar for handling consecutive type argument brackets in parameterized types, and consecutive type parameter brackets in generic types, did not appear in JLS3.
I believe this was the right decision, because the JSR 14 grammar was rather obscure. I reproduce it below for reference. Note that it relies on ReferenceTypeList and TypeParameterList which were defined elsewhere in JSR 14 - they appeared in JLS3, but in obscure locations (8.8.7.1 and 8.1.2) rather than chapter 4, which leads me to think that the grammar was carefully picked apart and the handling of >> / >>> was left on the cutting room floor.
Since compilers have been doing "the right thing" since 2004, the best option for JLS8 is to decree that compilers keep doing it. The contexts where a type is used are precisely defined by JSR 308, so 3.2 can rely those contexts when decreeing what "the right thing" is:
*****
The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would. There is one exception: if lexical translation occurs in a type context (4.11) and the input stream has two or more consecutive > characters that are followed by a non-> character, then each > character must be translated to the token for the numerical comparison operator >.
N.B. Without this rule, two consecutive > brackets in a type such as List<List<String>> would be mapped to the token for the signed right shift operator >>, while three consecutive > brackets in a type such as List<List<List<String>>> would be mapped to the token for the unsigned right shift operator >>>. Worse, the mapping of four or more consecutive > brackets in a type such as List<List<List<List<String>>>> would be ambiguous, as various combinations of >, >>, and >>> tokens could represent the >>>> characters.
*****
-----
Grammar from JSR 14 2.3 "Handling Consecutive Type Parameter Brackets"
ReferenceType:
ClassOrInterfaceType
ArrayType
TypeVariable
ClassOrInterfaceType:
Name
Name < ReferenceTypeList1
ReferenceTypeList1:
ReferenceType1
ReferenceTypeList , ReferenceType1
ReferenceType1:
ReferenceType >
Name < ReferenceTypeList2
ReferenceTypeList2:
ReferenceType2
ReferenceTypeList , ReferenceType2
ReferenceType2:
ReferenceType >>
Name < ReferenceTypeList3
ReferenceTypeList3:
ReferenceType3
ReferenceTypeList , ReferenceType3
ReferenceType3:
ReferenceType >>>
TypeParameters:
< TypeParameterList1
TypeParameterList1:
TypeParameter1
TypeParameterList , TypeParameter1
TypeParameter1:
TypeParameter >
TypeVariable extends ReferenceType2
TypeVariable implements ReferenceType2
-----