Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8304246

Compiler Implementation for Unnamed patterns and variables (Preview)

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 21
    • tools
    • None
    • source
    • minimal
    • Java API, Language construct
    • SE

      Summary

      Enhance the Java language with unnamed patterns, which match a record component without stating the component's name or type, and with unnamed variables, which can be initialized but not used. Both are denoted with an underscore: _.

      Problem

      Java developers use record patterns to disaggregate a record instance into its components. In the following code, one part of a program creates a ColoredPoint instance, while another part of the program uses pattern matching with instanceof to test whether a variable is a ColoredPoint, and extract its two components if so:

      record Point(int x, int y) {}
      enum Color { RED, GREEN, BLUE }
      record ColoredPoint(Point p, Color c) {}
      
      ... new ColoredPoint(new Point(3,4), Color.GREEN) ...
      
      if (r instanceof ColoredPoint(Point p, Color c)) {
          ... p.x() ... p.y() ...
      }

      The code above needs only p in the if block, not c, however today developers have to spell out all the components of a record class every time they perform pattern matching. Furthermore, it is not visually clear that the Color component is irrelevant. This is especially evident when record patterns are nested to extract data within components, such as:

      if (r instanceof ColoredPoint(Point(int x, int y), Color c)) {
          ... x ... y ...
      }

      As a result omitting unnecessary components such as Color c in both of the previous examples would be desirable for clearer code.

      In some other occasions, developers may not need to initialize any pattern variables during pattern matching but they will need to explore the shape of the structure at runtime. As a highly simplified example, consider the following Box and Ball classes, and a switch that explores the content of a Box:

      record Box<T extends Ball>(T content) {}
      
      sealed abstract class Ball permits RedBall, BlueBall, GreenBall {}
      final  class RedBall   extends Ball {}
      final  class BlueBall  extends Ball {}
      final  class GreenBall extends Ball {}
      
      Box<? extends Ball> b = ...
      switch (b) {
          case Box(RedBall   red)   -> processBox(b);
          case Box(BlueBall  blue)  -> processBox(b);
          case Box(GreenBall green) -> stopProcessing();
      }

      Since the variables are unused it would be ideal if the developer could elide their names, while keeping the explicit type for shape analysis reasons.

      Furthermore, if the switch was hypothetically refactored to group the first two patterns in one case (something that is not allowed in Pattern Matching for Switch):

      case Box(RedBall red), Box(BlueBall blue) -> processBox(b);

      then it would be erroneous to name the components: Neither of the names is usable on the right-hand side because either of the patterns on the left-hand side could have matched. Since the names are unusable it would be ideal to elide them.

      Turning to traditional imperative code, most developers will have encountered the situation of having to declare a variable that they did not intend to use. This typically occurs when the side effect of a statement is more important than its result. For example, the following code uses an enhanced-for statement to step through a collection, calculating total as a side effect, without using the loop variable order:

      int total = 0;
      for (Order order : orders) {
          if (total < LIMIT) { 
              ... total++ ...
          }
      }

      The prominence of order's declaration is unfortunate given that order is not used. Here is another example where the side effect of a expression is more important than its result, leading to an unused variable. The following code dequeues data but only needs two out of every three elements:

      Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2 .. 
      while (q.size()>=3) {
         int x = q.remove();
         int y = q.remove();
         int z = q.remove(); // z is unused
          ... new Point(x, y) ...
      }

      The third call to remove() has the desired side effect -- dequeuing an element -- regardless of whether its result is assigned to a variable, so the declaration of z could be elided--while satisfying the desire to show that remove indeed could returns a value.

      Unused variables occur frequently in two other kinds of statement that focus on side effects:

      • The try-with-resources statement is always used for its side effect: the automatic closing of resources. For example the following code acquires and (automatically) releases a context; the name acquiredContext is merely clutter:
      try (var acquiredContext = ScopedContext.acquire()) {
          ... acquiredContext not used ...
      }
      • Exceptions are the ultimate side effect, and handling one often gives rise to an unused variable. For example, most Java developers will have written catch blocks as shown below, where the name of the exception parameter is irrelevant:
      String s = ...;
      try { 
          int i = Integer.parseInt(s);
          ... i ...
      } catch (NumberFormatException ex) { 
          System.out.println("Bad number: " + s);
      }

      Even code without side effects is sometimes forced to declare unused variables. For example, the following code generates a map where each key mapped to the same placeholder value; since the lambda parameter v is not used, its name is irrelevant:

      ...stream.collect(Collectors.toMap(String::toUpperCase, v -> "NODATA"));

      In all these scenarios where variables are unused and their names are irrelevant, it would be ideal if developers could declare variables with no name.

      Solution

      The Java language is enhanced as follows:

      • Allow the underscore _ to denote an unnamed pattern in place of a whole type pattern or record pattern.
      • Allow the underscore _ to denote an unnamed pattern variable in a type pattern.
      • Allow the underscore _ to denote an unnamed variable when either the local variable in a local variable declaration statement, or an exception parameter in a catch clause, or a lambda parameter in a lambda expression, are unused. The following kinds of declaration can introduce either a named variable (denoted by an identifier) or an unnamed variable (denoted by an underscore):

        • a local variable declaration statement in a block (JLS 14.4.2)
        • a resource specification of a try-with-resources statement (JLS 14.20.3)
        • the header of a basic for statement (JLS 14.14.1)
        • the header of an enhanced for loop (JLS 14.14.2)
        • an exception parameter of a catch block (JLS 14.20)
        • a formal parameter of a lambda expression (JLS 15.27.1)
      • Allow unnamed pattern variables in a switch that needs to execute the same action for multiple cases. The grammar of switch labels is enhanced to allow multiple patterns. Those are semantically correct only when unnamed pattern variables are used in all pattern cases and no binding variables are introduced.
      • Neither the unnamed pattern nor var _ may be used at the top level of a pattern: both ... instanceof _ and ... instanceof var _ are prohibited, as are case _ and case var _.
      • The linter for TWR + underscore needs to mute the lint warning for _ not being referenced. This is not applicable anymore for unnamed variables.
      • Update the javax.lang.model for unnamed variables. Tracked in a separate CSR: 8307577: Implementation for javax.lang.model for unnamed variables (Preview).

      Specification

      The updated JLS draft for unnamed patterns and variables is attached as jep443-20230322.zip. Also in https://cr.openjdk.org/~abimpoudis/unnamed/jep443-20230322/specs/unnamed-jls.html

      The proposed API enhancements are attached as specdiff.preliminary.00.zip. Those will mostly reflect the introduction of a new tree kind to support an AnyPatternTree. Changes in javax.lang.model are included in 8307577: Implementation for javax.lang.model for unnamed variables (Preview).

      The changes to the specification and API are a subject of change until the CSR is finalized.

            abimpoudis Angelos Bimpoudis
            abimpoudis Angelos Bimpoudis
            Vicente Arturo Romero Zaldivar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: