Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8311828

JEP 456: Unnamed Variables and Patterns

    XMLWordPrintable

Details

    • JEP
    • Status: Candidate
    • P2
    • Resolution: Unresolved
    • None
    • specification
    • None
    • Angelos Bimpoudis
    • Feature
    • Open
    • SE
    • amber dash dev at openjdk dot org
    • S
    • S
    • 456

    Description

      Summary

      Enhance the Java language with unnamed variables, which can be initialized but not used, and unnamed patterns, which match a record component without stating the component's name or type. Both are denoted by an underscore character, _.

      History

      This feature first previewed in JDK 21 via JEP 443, which was titled Unnamed Patterns and Variables. We here propose to finalize it without change.

      Goals

      • Capture developer intent that a given binding or parameter is unused, and enforce that property, so as to clarify programs and reduce opportunities for error.

      • Improve the maintainability of all code by identifying variables that must be declared (e.g., in catch clauses) but are not used.

      • Improve the readability of record patterns by eliding unnecessary nested patterns.

      Non-Goals

      • It is not a goal to allow unnamed fields or method parameters.

      • It is not a goal to alter the semantics of local variables in, e.g., definite assignment analysis.

      Motivation

      Developers will, for various reasons, sometimes declare a variable that they do not intend to use. That intent is known at the time the code is written, but if it is not captured then later maintainers of the code might accidentally use the variable, thereby violating the intent. If we can make it impossible to accidentally use such variables then code will be more informative, more readable, and less prone to error.

      Unused variables

      In traditional imperative code, most developers have encountered the situation of declaring a variable they did not intend to use, whether for reasons of code style or because the language requires a variable declaration in certain contexts. This is especially common in code whose side-effect is more important than its result. For example, this code calculates total as the side effect of a loop, without using the loop variable order:

      static int count(Iterable<Order> orders) {
          int total = 0;
          for (Order order : orders) // order is unused
              total++;
          return total;
      }

      The prominence of order's declaration is unfortunate, given that order is not used. The declaration can be shortened to var order, but there is no way to avoid giving this variable a name. The name itself can be shortened to, e.g., o, but this syntactic trick does not communicate the intent that the variable will go unused. In addition, static analysis tools typically complain about unused variables, even when the developer intends non-use and may not have a way to silence the warnings.

      For another example where the side effect of an expression is more important than its result, the following code dequeues data but only needs two out of every three elements:

      Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2 .. 
      while (q.size() >= 3) {
         int x = q.remove();
         int y = q.remove();
         int z = q.remove(); // z is unused
          ... new Point(x, y) ...
      }

      The third call to remove() has the desired side effect — dequeuing an element — regardless of whether its result is assigned to a variable, so the declaration of z could be elided. However, for maintainability, the developer may wish to consistently denote the result of remove() by declaring a variable. The developer of this code currently has two options, both unpleasant: Do not declare the variable z, which leads to an asymmetry and possibly a static-analysis warning about ignoring a return value, or else declare a variable that is not used and possibly get a static-analysis warning about an unused variable.

      Unused variables occur frequently in two other kinds of statements that focus on side effects:

      • The try-with-resources statement is always used for its side effect, namely the automatic closing of resources. In some cases a resource represents a context in which the code of the try block executes; the code does not use the context directly, so the name of the resource variable is irrelevant. For example, assuming a ScopedContext resource that is AutoCloseable, the following code acquires and automatically releases a context:

        try (var acquiredContext = ScopedContext.acquire()) {
          ... acquiredContext not used ...
        }

        The name acquiredContext is merely clutter, so it would be nice to elide it.

      • Exceptions are the ultimate side effect, and handling one often gives rise to an unused variable. For example, most Java developers have written catch blocks of this form, where the exception parameter ex is unused:

        String s = ...;
        try { 
          int i = Integer.parseInt(s);
          ... i ...
        } catch (NumberFormatException ex) { 
          System.out.println("Bad number: " + s);
        }

      Even code without side effects must sometimes declare unused variables. For example:

      ...stream.collect(Collectors.toMap(String::toUpperCase,
                                         v -> "NODATA"));

      This code generates a map which maps each key to the same placeholder value. Since the lambda parameter v is not used, its name is irrelevant.

      In all these scenarios, where variables are unused and their names are irrelevant, it would be better if we could simply declare variables with no name. This would free maintainers from having to understand irrelevant names, and would avoid false positives on non-use from static analysis tools.

      The kinds of variables that can reasonably be declared with no name are those which have no visibility outside a method: local variables, exception parameters, and lambda parameters, as shown above. These kinds of variables can be renamed or made unnamed without external impact. In contrast, fields — even if they are private — communicate the state of an object across methods, and unnamed state is neither helpful nor maintainable.

      Unused pattern variables

      Type patterns in switch blocks match selector expressions by specifying a type name and a binding name. For example, consider the following Ball class and a switch that explores the type of the ball:

      sealed abstract class Ball permits RedBall, BlueBall, GreenBall { }
      final  class RedBall   extends Ball { }
      final  class BlueBall  extends Ball { }
      final  class GreenBall extends Ball { }
      
      Ball ball = ...
      switch (ball) {
          case RedBall   red   -> process(ball);
          case BlueBall  blue  -> process(ball);
          case GreenBall green -> stopProcessing();
      }

      Each case examines the type of the Ball but the pattern variables red, blue, and green are not used. Since the variables introduced by the type patterns are not used, this code would be clearer if we could elide their names.

      As developers increasingly use records and their companion mechanism, sealed classes (JEP 409), we expect that pattern matching over complex data structures will become commonplace. Frequently, the shape of a structure will be just as important as the data items within it. Consider a Box type which can hold any type of Ball, but might also hold the null value:

      record Box<T extends Ball>(T content) { }
      
      Box<? extends Ball> box = ...
      switch (box) {
          case Box(RedBall   red)     -> processBox(box);
          case Box(BlueBall  blue)    -> processBox(box);
          case Box(GreenBall green)   -> stopProcessing();
          case Box(var       itsNull) -> pickAnotherBox();
      }

      The now-nested type patterns still introduce pattern variables that are not used. Since this switch is more involved than the previous one, eliding the names of the unused variables in the nested type patterns would even further improve readability.

      Even if we could elide the names of the unused pattern variables in the previous examples, they still contain duplicate code on the right-hand side for the RedBall and BlueBall cases. We could try to refactor the switch blocks to group the first two patterns in one case label, producing

      case RedBall red, BlueBall blue -> process(ball);             // compile-time error

      and

      case Box(RedBall red), Box(BlueBall blue) -> processBox(box); // compile-time error

      It would be erroneous, however, to name the components: Neither of the names is usable on the right-hand side because either of the patterns on the left-hand side can match. Since the names are unusable, it would be better if we could elide them.

      Unused nested patterns

      Records (JEP 395) and record patterns (JEP 440) work together to streamline data processing. A record class aggregates the components of a data item into an instance, while code that receives an instance of a record class uses pattern matching to disaggregate the instance into its components. For example:

      record Point(int x, int y) { }
      enum Color { RED, GREEN, BLUE }
      record ColoredPoint(Point p, Color c) { }
      
      ... new ColoredPoint(new Point(3,4), Color.GREEN) ...
      
      if (r instanceof ColoredPoint(Point p, Color c)) {
          ... p.x() ... p.y() ...
      }

      In this code, one part of the program creates a ColoredPoint instance while another part uses pattern matching with instanceof to test whether a variable is a ColoredPoint and, if so, extract its two components.

      Record patterns such as ColoredPoint(Point p, Color c) are pleasingly descriptive, but it is common for programs to need only some of the components for further processing. For example, the code above needs only p in the if block, not c. It is laborious to write out all the components of a record class every time we do such pattern matching. Furthermore, it is not visually clear that the entire Color component is irrelevant; this makes the condition in the if block harder to read, too. This is especially evident when record patterns are nested to extract data within components, such as:

      if (r instanceof ColoredPoint(Point(int x, int y), Color c)) {
          ... x ... y ...
      }

      We can use var to reduce the visual cost of the unnecessary component Color c, e.g., ColoredPoint(Point(int x, int y), var c), but it would better to reduce the cost even further by omitting unnecessary components altogether. This would both simplify the task of writing record patterns and improve readability, by removing clutter from the code.

      Description

      An unnamed variable is declared by using an underscore character, _ (U+005F), to stand in for the local variable in a local variable declaration statement, or an exception parameter in a catch clause, or a lambda parameter in a lambda expression.

      An unnamed pattern variable is declared by using an underscore character to stand in for the pattern variable in a type pattern.

      The unnamed pattern is denoted by an underscore character and is equivalent to the type pattern var _. It allows both the type and name of a record component to be elided in pattern matching.

      A single underscore character is the lightest reasonable syntax for signifying the absence of a name. It is commonly used in other languages, such as Scala and Python, for this purpose. A single underscore was a valid identifier in Java 1.0, but we later reclaimed it for unnamed variables and patterns. We started issuing compile-time warnings when underscore was used as an identifier in Java 8 (2014) and we turned those warnings into errors in Java 9 (2017, JEP 213).

      The ability to use underscore in identifiers of length two or more is unchanged, since underscore remains a Java letter and a Java letter-or-digit. For example, identifiers such as _age and MAX_AGE and __ (two underscores) continue to be legal.

      The ability to use underscore as a

      digit<br /> separator

      is also unchanged. For example, numeric literals such as 123_456_789 and 0b1010_0101 continue to be legal.

      Unnamed variables

      The following kinds of declarations can introduce either a named variable (denoted by an identifier) or an unnamed variable (denoted by an underscore):

      • A local variable declaration statement in a block (JLS §14.4.2),
      • The resource specification of a try-with-resources statement (JLS §14.20.3),
      • The header of a basic for loop (JLS §14.14.1),
      • The header of an enhanced for loop (JLS §14.14.2),
      • An exception parameter of a catch block (JLS §14.20), and
      • A formal parameter of a lambda expression (JLS §15.27.1).

      Declaring an unnamed variable does not place a name in scope, so the variable cannot be written or read after it is initialized. An initializer must be provided for an unnamed variable in each kind of declaration above.

      An unnamed variable never shadows any other variable, since it has no name, so multiple unnamed variables can be declared in the same block.

      Here are the examples from above, modified to use unnamed variables.

      • An enhanced for loop with side effects:

        static int count(Iterable<Order> orders) {
          int total = 0;
          for (Order _ : orders) 
              total++;
          return total;
        }

        The initialization of a simple for loop can also declare unnamed local variables:

        for (int i = 0, _ = sideEffect(); i < 10; i++) { ... i ... }
      • An assignment statement, where the result of the expression on the right-hand side is not needed:

        Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2, ...
        while (q.size() >= 3) {
         var x = q.remove();
         var y = q.remove();
         var _ = q.remove();
         ... new Point(x, y) ...
        }

        If the program needed to process only the x1, x2, etc., coordinates then unnamed variables could be used in multiple assignment statements:

        while (q.size() >= 3) {
          var x = q.remove();
          var _ = q.remove();
          var _ = q.remove(); 
          ... new Point(x, 0) ...
        }
      • A catch block:

        String s = ...
        try { 
          int i = Integer.parseInt(s);
          ... i ...
        } catch (NumberFormatException _) { 
          System.out.println("Bad number: " + s);
        }

        Unnamed variables can be used in multiple catch blocks:

        try { ... } 
        catch (Exception _) { ... } 
        catch (Throwable _) { ... }
      • In try-with-resources:

        try (var _ = ScopedContext.acquire()) {
          ... no use of acquired resource ...
        }
      • A lambda whose parameter is irrelevant:

        ...stream.collect(Collectors.toMap(String::toUpperCase, _ -> "NODATA"))

      Unnamed pattern variables

      An unnamed pattern variable can appear in any type pattern (JLS §14.30.1), including var type patterns, regardless of whether the type pattern appears at the top level or is nested in a record pattern. For example, the Ball example can now be written:

      switch (ball) {
          case RedBall _   -> process(ball);
          case BlueBall _  -> process(ball);
          case GreenBall _ -> stopProcessing();
      }

      and the Box example:

      switch (box) {
          case Box(RedBall _)   -> processBox(box);
          case Box(BlueBall _)  -> processBox(box);
          case Box(GreenBall _) -> stopProcessing();
          case Box(var _)       -> pickAnotherBox();
      }

      By allowing us to elide names, unnamed pattern variables make run-time data exploration based on type patterns visually clearer, both in switch blocks and with the instanceof operator.

      Unnamed pattern variables are particularly helpful when a switch executes the same action for multiple cases. For example, the Ball example can be simplified to:

      switch (ball) {
          case RedBall _, BlueBall _ -> process(ball);
          case GreenBall _           -> stopProcessing();
      }

      The first two cases use top-level unnamed pattern variables because their right-hand sides do not use the bindings. Similarly, the Box and Ball example can be simplified to:

      switch (box) {
          case Box(RedBall _), Box(BlueBall _) -> processBox(box);
          case Box(GreenBall _)                -> stopProcessing();
          case Box(var _)                      -> pickAnotherBox();
      }

      A case label with multiple patterns can have a guard. A guard governs the case as a whole, rather than the individual patterns. For example, assuming that there is an int variable x, the first case of the previous example could be further constrained:

          case Box(RedBall _), Box(BlueBall _) when x == 42 -> processBox(b);

      Pairing a guard with each pattern is not allowed, so this is prohibited:

          case Box(RedBall _) when x == 0, Box(BlueBall _) when x == 42 -> processBox(b);

      The unnamed pattern

      The unnamed pattern is an unconditional pattern which binds nothing. Like the type pattern var _ , the unnamed pattern is usable in a nested context of a record pattern but not at the top level of an instanceof expression or a case label.

      Consequently, the earlier example can omit the type pattern for the Color component entirely:

      if (r instanceof ColoredPoint(Point(int x, int y), _)) { ... x ... y ... }

      Likewise, we can extract the Color component while eliding the record pattern for the Point component:

      if (r instanceof ColoredPoint(_, Color c)) { ... c ... }

      In deeply nested positions, using the unnamed pattern improves the readability of code that does complex data extraction. For example:

      if (r instanceof ColoredPoint(Point(int x, _), _)) { ... x ... }

      This code extracts the x coordinate of the nested Point while omitting both the y and Color components.

      Revisiting the Box and Ball example, we can further simplify its final case label by using an unnamed pattern instead of var _:

      switch (box) {
          case Box(RedBall _), Box(BlueBall _) -> processBox(box);
          case Box(GreenBall _)                -> stopProcessing();
          case Box(_)                          -> pickAnotherBox();
      }

      Risks and Assumptions

      • We assume that little if any actively-maintained code uses underscore as a variable name. Java developers migrating from Java 7 to Java 22 without having seen the warnings issued in Java 8 or the errors issued in Java 9 could be surprised. They face the risk of dealing with compile-time errors when reading or writing variables named _ and when declaring any other kind of element (class, field, etc.) with the name _.

      • We expect developers of static analysis tools to understand the new role of underscore for unnamed variables and avoid flagging the non-use of such variables in modern code.

      Alternatives

      • It is possible to define an analogous concept of unnamed method parameters. However, this has some subtle interactions with the specification (e.g., what does it mean to override a method with unnamed parameters?) and tooling (e.g., how do you write JavaDoc for unnamed parameters?). This may be the subject of a future JEP.

      • JEP 302 (Lambda Leftovers) examined the issue of unused lambda parameters and identified the role of underscore to denote them, but also covered many other issues which were handled better in other ways. This JEP addresses the use of unused lambda parameters explored in JEP 302 but does not address the other issues explored there.

      Attachments

        Activity

          People

            abimpoudis Angelos Bimpoudis
            abimpoudis Angelos Bimpoudis
            Angelos Bimpoudis Angelos Bimpoudis
            Brian Goetz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: