Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305968

Integrity by Default

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Unresolved
    • Icon: P2 P2
    • None
    • None
    • None
    • Ron Pressler, Alex Buckley, & Mark Reinhold
    • Informational
    • Open
    • SE

      Summary

      Developers expect that their code and data is protected against use that is unwanted or unwise. The Java Platform, however, contains unsafe APIs that can undermine this expectation, thereby damaging the correctness, maintainability, scalability, security, and performance of applications. Going forward, we will restrict the unsafe APIs so that, by default, libraries, frameworks, and tools cannot use them. Application authors will have the ability to override this default.

      What is integrity?

      The Oxford English Dictionary defines “integrity” as “the state of being whole and undivided; the condition of being sound in construction.”

      In the context of a computer program, integrity means that the constructs from which we build the program — and ultimately the program itself — are both whole and sound. Such constructs, whether they are low-level language facilities such as for loops or higher-level components such as classes or modules, have both specifications and implementations. Integrity thus requires two things of a computing construct:

      • Its specification must say everything that needs to be said in order to make effective use of the construct (wholeness), and

      • Its implementation must satisfy its specification (soundness).

      In more familiar terms, we say that a computing construct has integrity if, and only if, its specification is complete and its implementation is correct with respect to the specification.

      For example, the specification of Java arrays says that an array can only be accessed within the bounds set for it upon creation. This constraint is guaranteed by the JVM, which raises an exception if it is violated.

      The specification of Java arrays contains many other statements; e.g., that the length of an array never changes, that the first element of an array always has the index zero, and that accessing an array element after setting that element to some value returns exactly the same value (modulo concurrency). The JVM guarantees all of these statements — hence arrays are correct. Taken together, moreover, these statements capture all that we need to know in order to reason about any particular use of arrays — hence arrays are complete. We do not need to wonder, e.g., whether an array might silently increment all its elements at midnight on alternate Wednesdays, because its specification says nothing about midnight, or Wednesdays, and in fact its specification implies that this absurd situation cannot happen. Thus we can say that Java arrays have integrity.

      (Integrity has practical limits, of course; the JVM cannot prevent native code or external debuggers or cosmic rays from modifying array content. When we speak of integrity here, we mean integrity within the context of the Java Platform.)

      The Java Platform contains not just arrays but many other useful constructs, in both the language and in its built-in libraries. All of these constructs have both specifications and implementations, which taken together give the Platform itself a specification and an implementation. We intend, naturally, that the overall Platform have integrity: Its specification says all that needs to be said in order to reason effectively about its use (completeness), and its implementation behaves according to its specification (correctness). The integrity of the Platform enables us to reason about the correctness of our own code, starting from the specifications of the Platform's constructs.

      Benefits of integrity

      The Java Platform's integrity underpins many of its key benefits.

      • The Platform specifies that variables, fields, and arrays are initialized before use, thus a program's initial state is well defined.

      • The Platform specifies automatic storage management, thus a program never suffers from use-after-free errors.

      • The Java language and the Java Virtual Machine are specified so as to guarantee

        type<br /> safety

        , thus a program cannot perform invalid operations on data, such as treating a String as a Socket.

      • The Platform API (as of Java 20) does not allow threads to be stopped arbitrarily, thus a multi-threaded program never sees objects in an inconsistent state.

      Without integrity, we cannot rely upon any of these valuable properties.

      Integrity via encapsulation

      The Java language provides built-in constructs which enable us to build our own constructs at higher and higher levels of abstraction, by hiding unnecessary detail. We compose statements into methods, methods and fields into classes, classes into packages, packages into modules, and finally modules into entire programs.

      Abstraction enables us to control program complexity: We can show that the implementation of a higher-level construct meets its specification by reasoning solely from the specifications of the lower-level constructs upon which it is built; there is no need to consider the implementation details of the lower-level constructs, nor the specifications of any other constructs. Likewise, users of the higher-level construct need only refer to the specifications of that construct, and of any other constructs that they use, when reasoning about their own code; there is no need to consider the implementation details of the higher-level construct, nor the specifications of any other constructs. Ultimately, we can, in principle, show that an entire program meets its specification via such reasoning.

      For all of this to work requires that our higher-level constructs themselves have integrity: They must be complete and correct. A key tool for achieving that is encapsulation.

      For example, suppose we want to build a counter abstraction that is always even, never odd. Imagine that the Java language had no encapsulation, so that all fields and methods could be accessed from anywhere, as if everything were public. We might declare an EvenCounter class, like so:

      /**
       * Specification:
       *   - value() initially returns zero
       *   - incrementByTwo() increments value() by two
       *   - decrementByTwo() decrements value() by two
       *   - value() is always even, never odd
       */
      /*public*/ final class EvenCounter {
          /*public*/ int x = 0;
          /*public*/ int value() { return x; }
          /*public*/ void incrementByTwo() { x += 2; }
          /*public*/ void decrementByTwo() { x -= 2; }
      }

      We can easily show that the EvenCounter class, in isolation, meets its specification, thus it is correct. Its specification, however, is not complete: It does not say everything that needs to be said in order to make effective use of the class. That is because code external to the class can set the x field to an odd number at any time, thereby causing the value() method to violate the class's specification. To show that a use of the class is correct we must analyze every line of the entire program to ensure that no code external to the class modifies this field. Rather than simple local reasoning about each such use, complex global reasoning is required. It is as if the specification of the EvenCounter class includes the additional requirement that

       *   - No code external to this class modifies the x field

      With the actual Java language, of course, there is no need for this complexity since the language provides encapsulation constructs — the private and public keywords — which allow us to protect data from intentional or unintentional modification.

      public final class EvenCounter {
          private int x = 0;
          public int value() { return x; }
          public void incrementByTwo() { x += 2; }
          public void decrementByTwo() { x -= 2; }
      }

      Here we use the private keyword to protect the x field from external access. The private keyword has integrity: Its specification says that a private field can be modified only by code in the same class, and the Java compiler and the JVM guarantee this specification throughout the program. Making the x field private thus obviates the need to analyze the entire program when reasoning about the correctness of any use of the EvenCounter class. In other words, local reasoning about each such use is sufficient. The class's original specification, above, is thus complete; we already know that the class is correct with respect to that specification, thus the class has integrity.

      Abstraction enables us to create higher-level computing constructs; encapsulation enables us to imbue those constructs, and ultimately entire programs, with integrity. This provides tremendous value.

      • Correctness — The correctness of a program can rest upon the integrity of the EvenCounter class, in particular the fact that the value in an instance is always even. An application could, e.g., use EvenCounter to track business activity where every purchase needs to match a sale, resulting in an even number of transactions. Using encapsulation to imbue the class with integrity ensures that correctness cannot be undermined by code external to the class.

      • Maintainability — Encapsulation protects code as it evolves. We assume that private fields and methods are implementation details, able to be safely changed without breaking clients. The private field of the EvenCounter class cannot be accessed by client code, thus we can change it at will so long as we preserve correctness. We could, e.g., rename the field, or change its type to Integer. Encapsulation gives classes the integrity required to enable independent internal evolution.

      • Scalability — Encapsulation is a cornerstone of programming in the large because it ensures the integrity that enables local reasoning about the behavior of computing constructs. Programs can be built from independently-developed components that interact only through their public APIs and behave according to their specifications. This allows not just individual programs but the entire Java ecosystem to scale as collections of independent interoperating components.

      • Security — Encapsulation is essential for any kind of robust security. Suppose, e.g., that a class in the JDK restricts a sensitive operation:

        if (isAuthorized())
            doSensitiveOperation();

        The restriction is robust only if we can guarantee that doSensitiveOperation() is only ever invoked after a successful isAuthorized() check. If we declare doSensitiveOperation() as private in its declaring class then we know that no code in any other class can directly invoke that method; in other words, the declaring class has integrity with respect to that method. Code reviewers thus need only ensure that all invocations of the method within the declaring class are preceded by an isAuthorized() check; they can ignore all the other code in the program.

      • Performance — In the Java runtime, numerous optimizations can benefit from the integrity ensured by encapsulation. The JVM can, e.g., perform

        constant<br /> folding

        optimizations when it determines that the value of a private field never changes. Going further, a tool such as jlink could remove unused private methods at link time to reduce image size and class loading time.

      Undermining integrity

      Encapsulation is a key tool for establishing integrity. It underpins correctness, maintainability, scalability, security, and performance. There are, however, four APIs in the JDK which can circumvent it.

      We refer to these APIs as unsafe because they violate the integrity of the Java language's encapsulation constructs, thereby violating the integrity not only of the Platform itself but of every component and program built on top of it. The private field in an EvenCounter object could, e.g., be modified from outside the class via deep reflection, sun.misc.Unsafe, or native code, resulting in an odd value, violating the class's specification. The public methods of EvenCounter could be redefined by an agent to increment the private field by one instead of two, again resulting in an odd value.

      The fact that the language's encapsulation constructs lack integrity destroys the ability to reason locally about a program's correctness. To show that a use of an encapsulated component is correct we must analyze every class on the class path, on the module path, or loaded dynamically, and either rule out the use of unsafe APIs or else ensure that their use does not violate the component's specification. This analysis is not practical, thus any code that relies on the evenness of EvenCounter objects for its own correctness may behave incorrectly, and any client of that code may behave incorrectly, and so on.

      Even if a library uses an unsafe API with good intentions, and does not explicitly violate any other component's specification, it could still enable specification violations in an application that uses it. A JSON serialization library could, e.g., deserialize an EvenCounter object by using deep reflection to set the value of the object's private field, bypassing EvenCounter's public API. This, in itself, does not violate the specification of the EvenCounter class. If the application does not, however, take care to explicitly validate that its JSON input does not contain an odd number, then reading such input will result in an odd EvenCounter. The serialization library does not explicitly violate EvenCounter's specification, but by circumventing EvenCounter's defense mechanism — its encapsulation — it makes it vulnerable to indirect specification violations.

      This problem is especially serious with security-sensitive components. A vulnerability in a library that uses an unsafe API jeopardizes the integrity of every component of the application, and could allow an adversary to manipulate input to the application in a way that undermines security.

      The unsafe APIs in the JDK violate the integrity of language constructs other than those related to encapsulation. Constructs that access arrays and objects, in particular, are specified so as to ensure

      memory<br /> safety

      : An array cannot be accessed beyond its bounds, and an object cannot be accessed after its storage is reclaimed. We have relied on the memory safety of the Java Platform for decades, but it can be violated by the unsafe APIs, leading to

      undefined<br /> behavior

      and even JVM crashes.

      • JNI allows the execution of native code that can violate memory safety. Native code can also produce a

        byte<br /> buffer

        that wraps arbitrary memory locations, which means any Java code that

        accesses the<br /> buffer

        can cause undefined behavior.

      • The Foreign Function & Memory API (FFM,

        JEP<br /> 454

        ) allows the execution of native code that can violate memory safety. The FFM API also allows Java code to create a

        memory<br /> segment

        that wraps arbitrary memory locations. Any Java code that accesses such a segment can cause undefined behavior.

      • The sun.misc.Unsafe class includes methods that can read and write arbitrary memory locations, both on and off the JVM's heap. Thus an array can be accessed beyond its bounds, and an object's storage can be accessed long after it is reclaimed by the garbage collector — a classic use-after-free error.

      The integrity of the Java Platform — and hence the correctness, maintainability, scalability, security, and performance of our programs — requires that we prevent encapsulation from being circumvented and memory safety from being violated. How can we square this with the presence of the unsafe APIs, which are designed to offer library, framework, and tool developers special superpowers for use in rare situations in which there is no other way to solve a problem? The answer is that we must adopt integrity by default.

      Integrity by default

      Integrity by default means that every construct of the Java Platform has integrity, unless overridden explicitly at the highest level of the program. That is, the developer of an application can choose to give up selected kinds of integrity within the scope of that application; the developer of a library, framework, or tool, however, cannot. An application developer can, e.g., choose to configure the Java runtime to allow a serialization library to use unsafe APIs, knowingly acquiescing to a loss of integrity because the library's functionality is indispensable. Without such explicit permission, however, that library cannot, on its own, violate any aspect of Platform or application integrity.

      We have gradually been moving the Java Platform toward integrity by default since JDK 9. We have done so by selectively degrading or gating the ability of the unsafe APIs to undermine integrity. This effort has three strands.

      • JDK code is strongly encapsulated in modules. By default, deep reflection cannot circumvent strong encapsulation.

      • Unsafe APIs that are standard in the Java Platform are restricted. By default, Java code cannot circumvent encapsulation by using the Instrumentation API to redefine methods, or violate encapsulation or memory safety by using JNI or FFM to call native code.

      • Unsafe APIs that are non-standard are removed when standard replacement APIs become available. The replacement APIs are designed so that, by default, they cannot undermine integrity.

      Strong encapsulation: The antidote to deep reflection

      JDK 9 introduced modules to the Java language. A module is a set of packages designed to work together and intended for re-use. If a package is exported then its public elements can be used outside the module; if a package is not exported then its public elements can be used only inside the module.

      Modules provide strong encapsulation, which means that reflection by code outside of a module cannot access the private elements of any class within the module. That is, the setAccessible method respects module boundaries. If the public EvenCounter class, e.g., is declared in an explicit module, then its private field x cannot be modified by deep reflection initiated by code outside the module.

      Restrictions on standard unsafe APIs

      Most of the unsafe APIs — setAccessible, JNI, FFM, and Instrumentation — continue to be supported in the Java Platform. While they are rarely used by application code directly, they are essential for a relatively small number of libraries, frameworks, and tools whose core functionality cannot be implemented any other way. Examples include:

      • Frameworks for unit testing and dependency injection (DI) that use deep reflection to access private fields and methods of application classes;

      • Serialization libraries that use deep reflection to access private fields of application classes;

      • Mocking libraries that use the Instrumentation API to redefine methods of application classes;

      • Native wrapper libraries that use JNI to call native methods or FFM to invoke downcall method handles; and

      • Application Performance Monitoring (APM) tools that use agents to inject logging and performance counters into application code.

      A component that uses an unsafe API violates the integrity of the Java Platform: It introduces the possibility that encapsulation will be circumvented or memory safety will be violated, thereby rendering the specification of the Platform incomplete. If the Platform has no integrity then components built on top of it have no integrity, and applications themselves have no integrity. The policy of integrity by default enshrines the idea that the developer of a library, framework, or tool cannot unilaterally decide to violate integrity by using an unsafe API. That power — and the corresponding responsibility — belongs solely to the application's developer (or perhaps deployer, on the advice of the developer). The application's developer answers to end users for the behavior of the application; developers of libraries, frameworks, and tools, by contrast, do not.

      We cannot treat the mere inclusion of an unsafe-using library or framework in an application as consent by the application's developer to violate integrity. The developer might not be aware that the component uses an unsafe API. The developer might not even be aware that the component is present, since the component could be an indirect dependency several layers removed from the application itself. The application developer must therefore explicitly configure the Java runtime to allow selected components to use unsafe APIs. If the runtime is not suitably configured then using an unsafe API causes an exception to be thrown. In other words, use of unsafe APIs is restricted by default.

      Various command-line options configure the Java runtime to allow the use of unsafe APIs:

      Application developers can specify these options in multiple ways:

      No matter how they are specified, these options configure the Java runtime when it starts, enabling the JVM to determine how integrity will be undermined and which optimizations should be enabled or disabled. These options also make it easy for application developers to audit the use of unsafe APIs and understand the risks posed to correctness, maintainability, scalability, security, and performance. If none of these options is used then the application developer can be certain that neither the application nor its dependencies violate the integrity of the Platform, the application's dependencies, or the application itself.

      Adapting to restrictions on unsafe APIs

      Most of the unsafe APIs in the Java Platform were in use for years before they were deemed unsafe, so it is not practical to restrict them without notice: applications would fail unexpectedly. In addition, configuring the Java runtime to allow the use of unsafe APIs by libraries, frameworks, and tools is not part of the traditional developer experience. To alert application developers to the need to configure the Java runtime, we restrict the use of a pre-existing unsafe API in a gradual fashion:

      • In an initial JDK release, the API can be used as normal.
      • In a later JDK release, the API can be used, but doing so produces a warning. The warning identifies the library that used the API and frames the use as "illegal". The warning also explains how to configure the Java runtime to allow the use, e.g., with --add-opens. Only the first use of the API by a particular module causes a warning; further use by code in the same module does not cause further warnings.
      • Eventually, in another JDK release, the API cannot be used by default. Calling the API causes an exception to be thrown, unless the Java runtime has been configured to allow the use.

      This process typically takes a few years, during which time the JDK offers a temporary command-line option that lets the application developer "dial up" or "dial down" the process. For example, --add-opens had a temporary counterpart of <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">--illegal-access</code>. The temporary option has three settings:

      • allow (or permit) -- allow use of the API, with no warnings.
      • warn -- allow use of the API, with warnings.
      • deny -- disallow use of the API, and throw an exception.

      Typically, the temporary option defaults to allow in the initial JDK release, then warn in a later JDK release, and eventually deny. Application developers can "dial up" the option to deny at any time, simulating the long-term behavior planned for the Java runtime. In contrast, the ability to "dial down" the option is limited: when the default is warn, the option can be set to allow, but once the default is deny, the option can only be set to warn, not allow. After the default has been deny for some time, the temporary option is removed.

      Removing non-standard unsafe APIs

      The sun.misc.Unsafe class includes methods that perform a variety of low-level operations without any safety checks. Since JDK 9 we have been adding standard APIs that offer safer replacements for this functionality. The low-level manipulation of objects in the JVM's heap, e.g., can now be done more safely via the <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">VarHandle</code> API, and manipulation of data in off-heap memory can now be done more safely via FFM's <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">MemorySegment</code> API.

      We have already deprecated for removal and, later, removed some elements of sun.misc.Unsafe which now have standard API replacements. We will continue to do so in future releases. Ultimately, we will deprecate sun.misc.Unsafe itself for removal, and then remove it.

      Embracing integrity by default

      Libraries, frameworks, and tools can relieve application developers of some of the effort of configuring the Java runtime in many situations.

      • Developers of dependency injection frameworks can ask application developers to grant access to the application's private fields and methods directly in the code. One approach is to ask application developers to open the packages of their modules to the framework module by placing, e.g.,

        opens com.example.app to org.framework;

        in their module declarations. Deep reflection can access every element in an open package, even private elements. A framework can, if necessary, transfer its access rights to another component via <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">Module::addOpens</code>.

        A better approach is to ask application developers to create

        method-handle lookup<br /> objects

        and pass them to the framework; e.g.,

        Framework.grantAccess(MethodHandles.lookup());

        A lookup object grants access to the private elements accessible to the code that created it, so the framework can use the lookup object to perform deep reflection on application code without any application packages being open.

      • Serialization libraries have caused many security vulnerabilities by using deep reflection to access private fields of application classes. In general, it is a mistake for libraries to serialize and deserialize an object without the cooperation of the object's class. Objects such as strings, records, enums, and collections are easy to serialize and deserialize because their classes provide public accessors and constructors. For other objects, serialization libraries should specify protocols by which classes can expose their state during serialization and accept and validate new state during deserialization. For this to work, application developers may need to grant access to classes in non-exported packages by opening packages or passing lookup objects.

        Some classes already take responsibility for their own serialization and deserialization by implementing the <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">java.io.Serializable</code> interface. Serialization libraries can take advantage of that by invoking the writeObject and readObject methods of such classes via the <code class="prettyprint" data-shared-secret="1735220887281-0.3568594305775814">sun.reflect.ReflectionFactory</code> class, which is

        supported for this<br /> purpose

        .

        In the long term, we expect the Java Platform to offer

        better<br /> serialization

        .

      • Unit-testing frameworks and mocking libraries can integrate with build tools such as Maven and Gradle to configure the Java runtime automatically. Build tools could, e.g., start test runs with options necessary to circumvent encapsulation (--add-opens, --add-exports), patch modules (--patch-module), and install agents (-javaagent).

      More elaborate frameworks and applications that wish to control the initialization and bootstrapping of the runtime and/or of components can programmatically grant code permission to use unsafe APIs:

      Integrity beyond the Java Platform

      Java code can use standard facilities of the Platform to reach outside the Java runtime and violate integrity. Java code can, e.g., alter the content of a class file in the file system before the class is loaded. However, a good principle in matters of integrity is that

      The integrity of components is best enforced by the infrastructure that provides them.

      The integrity of the file system and its content is the responsibility of the operating system, not the Java runtime. The OS or, if appropriate, an OS-level container, should always be configured so as to protect the integrity of the Java runtime's files and memory, and the integrity of the application’s files, regardless of the measures taken by the Java runtime to protect its own integrity and that of the application it is running.

      Why now?

      The Java ecosystem has managed just fine without strong encapsulation or restrictions on unsafe APIs for nearly three decades. Why are we now adopting integrity by default, which adds overhead for some library, framework, tool, and application developers?

      The answer is that, in recent years, both the JDK and the environment in which Java applications run have changed.

      • Correctness — Historically, the Java runtime has been able to ensure the integrity of many low-level constructs because they were implemented in native code, beyond the reach of the unsafe APIs. However, more and more of the Java runtime itself is being written or rewritten in Java. This means that more and more of the Platform's integrity depends upon the integrity of Java code, which can be violated by the unsafe APIs.

      • Maintainability — In order to add new features without drowning in maintenance, we need to be able to remove obsolete components from the JDK and refactor its implementation at will. Unfortunately, over time various libraries, frameworks, and tools came to depend on some of the JDK's internal APIs, which they assumed were stable. As a result, it was increasingly difficult for the ecosystem to migrate to newer releases. We could either accept a slowing pace of Platform evolution or inflict migration pain just once more, in JDK 17, by strongly encapsulating the JDK's internal APIs. (We modularized the JDK in JDK 9, but we only fully enabled strong encapsulation in JDK 17.)

      • Security — With the impending removal of the Security Manager (JEP 411), we need strong encapsulation to support the creation of robust security layers protected from interference by other code, as shown earlier. Without strong encapsulation, any vulnerable code in the application could compromise security.

      • Performance — There is growing demand to improve startup time, warmup time, and image size, which are important for deploying Java applications in modern cloud environments. Some techniques for achieving these goals require that classes do not change over time by, e.g., being redefined via the Instrumentation API. Other important optimizations, such as constant folding, require that constructs such as final fields have integrity, so that their values are actually final and cannot be modified.

      In short: The use of JDK-internal APIs caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met. Despite the value that the unsafe APIs offer to libraries, frameworks, and tools, the ongoing lack of integrity is untenable. Strong encapsulation and the restriction of the unsafe APIs — by default — are the solution.

            mr Mark Reinhold
            rpressler Ron Pressler
            Ron Pressler Ron Pressler
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: