Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8280389

Class-File API (Preview)

    XMLWordPrintable

Details

    • JEP
    • Status: Submitted
    • P3
    • Resolution: Unresolved
    • None
    • core-libs
    • None
    • Brian Goetz
    • Feature
    • Open
    • JDK
    • classfile dash api dash dev at openjdk dot org
    • M
    • M

    Description

      Summary

      Provide a standard API for parsing, generating, and transforming Java class files. This is a preview API.

      Goals

      • Provide an accurate, complete, performant, and standard API for reading, writing, and transforming Java class files, which tracks the class-file specification.

      • Enable the replacement of existing uses of ASM within the JDK, eventually enabling the removal of the JDK's internal copy of the ASM library. We may similarly be able to remove the JDK's two custom internal class-file libraries.

      Non-Goals

      • It is not a goal to obsolete other class-file processing libraries in the ecosystem, or to be the world's fastest class-file API.

      • It is not a goal to extend runtime reflection to give access to the bytecode of method bodies of loaded classes.

      • It is not a goal to provide code analysis functionality; that can be layered atop this API via third-party libraries.

      Motivation

      Class-file generation, parsing, and instrumentation is ubiquitous in the Java ecosystem. Many tools and libraries process class files, and frameworks often perform on-the-fly bytecode instrumentation, transformation, and generation.

      The Java ecosystem has many different libraries for class-file parsing and generation, each with different design goals, strengths and weaknesses. In the last decade the JDK has made extensive use of the ASM library in its implementation, for tasks such as lambda proxy generation. However, there are a number of reasons why it makes sense for the JDK to include its own authoritative class-file library.

      • JVM evolution — The JVM and the class-file format are evolving much faster now than in the early years of the platform. While some evolutions are simple, for example adding new attributes such as NestMembers, others are more complex. Project Valhalla, for example, will bring new bytecodes and field descriptors. At some point it may become prohibitively expensive to evolve existing libraries to support these new features. A JDK class-file library can evolve with the class-file format, reducing the friction of implementing and testing new class-file features.

      • JDK consolidation — The JDK itself is a significant dealer in class files. For historical reasons it contains four distinct internal class-file libraries:

        • A custom library in the jdk.compiler module, used by the javac compiler and the javadoc tool;

        • Another custom library in the jdk.jdeps module, used by the javap, jdeps, jdeprscan, and jlink tools;

        • A fork of BCEL in the java.xml module, used in a fork of Xalan; and

        • A fork of ASM in the java.base module, used in the implementation of lambdas, method handles, modules, dynamic proxies, JFR, and the jar, jimage, jlink, and jshell tools.

        In the case of ASM, using it to implement fundamental elements of the platform imposes a delay on the use of new class-file features. The ASM version for JDK N cannot finalize until JDK N finalizes, hence JDK tools such as jlink cannot process class-file features that are new in JDK N, hence javac cannot generate class-file features that are new in JDK N until JDK N+1. JDK developers need a class-file library that is kept up-to-date with the JVM.

      • Version skew between frameworks and the running JDK — Applications which use frameworks that process class files generally bundle a class-file library. But new class-file features can appear in any JDK release, and the rate of JDK releases accelerated substantially after JDK 9, so applications are more frequently encountering class files that are newer than the library that they bundle. This results in runtime errors or, worse, frameworks trying to parse class files from the future and engaging in leaps of faith that nothing too serious has changed. Application and framework developers need a class-file library that they can count on to be up-to-date with the running JDK.

      • Language improvements — An obvious idea is to "just" merge ASM into the JDK and take on responsibility for its ongoing maintenance, but this is not the right choice. ASM is an old code base with a lot of legacy baggage, it is difficult to evolve, and the design priorities that informed its architecture are likely not what we would choose today. Moreover, the Java language has improved substantially since ASM was created, so what might have been the best API idioms in 2002 may not be ideal two decades later.

      Description

      We have adopted the following design goals and principles for the API.

      • Class-file entities are represented by immutable objects — All class-file entities, such as methods, fields, attributes, instructions, annotations, etc., are represented by immutable objects. This facilitates reliable sharing when a class file is being transformed.

      • Tree-structured representation — A class file has a tree structure. A class has some metadata (name, supertype, etc.), and a variable number of fields, methods, and attributes. Fields and methods themselves have metadata and further contain attributes, including the Code attribute. The Code attribute further contains instructions, exception handlers, and so forth. The API for navigating and building class files should reflect this structure.

      • User-driven navigation — The path we take through the class-file tree is driven by user choices. If the user cares only about annotations on fields then we should only have to parse as far down as the annotation attributes inside the field_info structure; we should not have to look into any of the class attributes or the bodies of methods, or at other attributes of the field. Users should be able to deal with compound entities, such as methods, either as single units or broken into streams of their constituent parts, as desired.

      • Laziness — User-driven navigation enables significant efficiencies, such as not parsing any more of the class file than is required to satisfy the user's needs. If the user is not going to dive into the contents of a method then we need not parse any more of the method_info structure than is needed to figure out where the next class-file element starts. We can lazily inflate, and cache, the full representation when the user asks for it.

      • Unified streaming and materialized views — Like ASM, we want to support both a streaming and a materialized view of a class file. The streaming view is suitable for the majority of use cases, while the materialized view is more general since it allows random access. We can provide a materialized view far less expensively than ASM through laziness, as enabled by immutability. We can, further, align the streaming and materialized views so that they use a common vocabulary and can be used in coordination, as is convenient for each use case.

      • Emergent transformation — If the class-file reading and writing APIs are sufficiently aligned then transformation can be an emergent property that does not require its own special mode or significant new API surface. (ASM achieves this by using a common visitor structure for readers and writers.) If classes, methods, fields, and code bodies are readable and writable as streams of elements then a transformation can be viewed as a flat-map operation on this stream, defined by lambdas.

      • Detail hiding — Many parts of a class file (constant pool, bootstrap method table, stack maps, etc.) are derived from other parts of the class file. It makes no sense to ask the user to construct these directly; this is extra work for the user and increases the chance of error. The library will automatically generate entities that are tightly coupled to other entities based on the methods, fields, and instructions added to the class file.

      • Lean into the language — In 2002, the visitor approach used by ASM seemed clever, and was surely more pleasant to use than what came before. However, the Java programming language has improved tremendously since then — with the introduction of lambdas, records, sealed classes, and pattern matching — and the platform now has a standard API for describing class-file constants (java.lang.constant). We can use these to design an API that is more flexible and pleasant to use, less verbose, and less error-prone.

      This is preview API, disabled by default

      To try the examples below in JDK 21 you must enable preview features as follows:

      • Compile the program with javac --release 21 --enable-preview Main.java and run it with java --enable-preview Main; or,

      • When using the source code launcher, run the program with java --source 21 --enable-preview Main.java

      Elements, builders, and transforms

      We construct the API from three key abstractions.

      • An element is an immutable description of some part of a class file; it may be an instruction, attribute, field, method, or an entire class file. Some elements, such as methods, are compound elements; in addition to being elements they also contain elements of their own, and can be dealt with as a whole or else further decomposed.

      • Each kind of compound element has a corresponding builder which has specific building methods (e.g., ClassBuilder::withMethod) and is also a Consumer of the appropriate element type.

      • Finally, a transform represents a function that takes an element and a builder and mediates how, if at all, that element is transformed into other elements.

      We introduce the API by showing how it can be used to parse class files, generate class files, and combine parsing and generation into transformation. The draft API specification is available here.

      Reading, with patterns

      ASM's streaming view of class files is visitor-based. Visitors are bulky and inflexible; the visitor pattern is often characterized as a library workaround for the lack of pattern matching in the language. Now that we have pattern matching we can express things more directly and concisely. For example, if we want to traverse a Code attribute and collect dependencies for a class dependency graph then we can simply iterate through the instructions and match on the ones we find interesting. A CodeModel describes a Code attribute; we can iterate over its CodeElements and handle those that include symbolic references to other types:

      CodeModel code = ...
      HashSet<ClassDesc> deps = new HashSet<>();
      for (CodeElement e : code) {
          switch (e) {
              case FieldInstruction f -> deps.add(f.owner());
              case InvokeInstruction i -> deps.add(i.owner());
              // similar for instanceof, cast, etc
          }
      }

      Writing, with builders

      Consider the following snippet of code:

      void fooBar(boolean z, int x) {
          if (z)
              foo(x);
          else
              bar(x);
      }

      With ASM we could generate this method as follows:

      ClassWriter classWriter = ...
      MethodVisitor mv = classWriter.visitMethod(0, "fooBar", "(ZI)V", null, null);
      mv.visitCode();
      mv.visitVarInsn(ILOAD, 1);
      Label label1 = new Label();
      mv.visitJumpInsn(IFEQ, label1);
      mv.visitVarInsn(ALOAD, 0);
      mv.visitVarInsn(ILOAD, 2);
      mv.visitMethodInsn(INVOKEVIRTUAL, "Foo", "foo", "(I)V", false);
      Label label2 = new Label();
      mv.visitJumpInsn(GOTO, label2);
      mv.visitLabel(label1);
      mv.visitVarInsn(ALOAD, 0);
      mv.visitVarInsn(ILOAD, 2);
      mv.visitMethodInsn(INVOKEVIRTUAL, "Foo", "bar", "(I)V", false);
      mv.visitLabel(label2);
      mv.visitInsn(RETURN);
      mv.visitEnd();

      The MethodVisitor in ASM doubles as both a visitor and a builder. Clients can create a ClassWriter directly and then can ask the ClassWriter for a MethodVisitor. However, there is value in inverting this API idiom: Instead of the client creating a builder with a constructor or factory, it can provide a lambda which accepts a builder:

      ClassBuilder classBuilder = ...
      classBuilder.withMethod("fooBar", MethodTypeDesc.of(CD_void, CD_boolean, CD_int), flags,
                              methodBuilder -> methodBuilder.withCode(codeBuilder -> {
          Label label1 = new Label();
          Label label2 = new Label();
          codeBuilder.iload(1)
              .branch(IFEQ, label1)
              .aload(0)
              .iload(2)
              .invokevirtual(ClassDesc.of("Foo"), "foo", MethodTypeDesc.of(CD_void, CD_int))
              .branch(GOTO, label2)
              .labelBinding(label1)
              .aload(0)
              .iload(2)
              .invokevirtual(ClassDesc.of("Foo"), "bar", MethodTypeDesc.of(CD_void, CD_int))
              .labelBinding(label2);
              .return_();
      });

      This is more specific and transparent — the builder has lots of convenience methods such as aload(n) — but not yet any more concise or higher-level. Yet there is already a powerful hidden benefit: By capturing the sequence of operations in a lambda we get the possibility of replay, which enables the library to do work that previously the client had to do. For example, branch offsets can be either short or long. If clients generate instructions imperatively then they have to compute the size of each branch's offset when generating the branch, which is complex and error prone. But if the client provides a lambda that takes a builder then the library can optimistically try to generate the method with short offsets and, if that fails, discard the generated state and re-invoke the lambda with different code generation parameters.

      Decoupling builders from visitation also lets us provide higher-level conveniences to manage block scoping and local-variable index calculation, and allows us to eliminate manual label management and branching:

      CodeBuilder classBuilder = ...
      classBuilder.withMethod("fooBar", MethodTypeDesc.of(CD_void, CD_boolean, CD_int), flags,
                              methodBuilder -> methodBuilder.withCode(codeBuilder -> {
          codeBuilder.iload(codeBuilder.parameterSlot(0))
                     .ifThenElse(
                         b1 -> b1.aload(codeBuilder.receiverSlot())
                                 .iload(codeBuilder.parameterSlot(1))
                                 .invokevirtual(ClassDesc.of("Foo"), "foo",
                                                MethodTypeDesc.of(CD_void, CD_int)),
                         b2 -> b2.aload(codeBuilder.receiverSlot())
                                 .iload(codeBuilder.parameterSlot(1))
                                 .invokevirtual(ClassDesc.of("Foo"), "bar",
                                                MethodTypeDesc.of(CD_void, CD_int))
                     .return_();
      });

      Because block scoping is managed by the library, we did not have to generate labels or branch instructions — the library inserted them for us. Similarly, the library can optionally manage block-scoped allocation of local variables, freeing clients of the bookkeeping for local-variable slots as well.

      Transformation

      The reading and writing APIs line up so that transformation is seamless. The reading example above traversed a sequence of CodeElements, letting the client match against the individual elements. The builder accepts CodeElements so that typical transformation idioms fall out naturally.

      Suppose we want to process a class file and keep everything unchanged except for removing methods whose names start with "debug". We would get a ClassModel, create a ClassBuilder, iterate the elements of the original ClassModel, and pass all of them through to the builder except for the methods we want to drop:

      ClassModel classModel = Classfile.parse(bytes);
      byte[] newBytes = Classfile.build(classModel.thisClass().asSymbol(),
              classBuilder -> {
                  for (ClassElement ce : classModel) {
                      if (!(ce instanceof MethodModel mm
                              && mm.methodName().stringValue().startsWith("debug"))) {
                          classBuilder.with(ce);
                      }
                  }
              });

      Transforming method bodies is slightly more complicated since we have to explode classes into their parts (fields, methods, and attributes), select the method elements, explode the method elements into their parts (including the code attribute), and then explode the code attribute into its elements (i.e., instructions). The following transformation swaps invocations of methods on class Foo to invocations of methods on class Bar:

      ClassModel classModel = Classfile.parse(bytes);
      byte[] newBytes = Classfile.build(classModel.thisClass().asSymbol(),
              classBuilder -> {
                  for (ClassElement ce : classModel) {
                      if (ce instanceof MethodModel mm) {
                          classBuilder.withMethod(mm.methodName(), mm.methodType(),
                                  mm.flags().flagsMask(), methodBuilder -> {
                                      for (MethodElement me : mm) {
                                          if (me instanceof CodeModel codeModel) {
                                              methodBuilder.withCode(codeBuilder -> {
                                                  for (CodeElement e : codeModel) {
                                                      switch (e) {
                                                          case InvokeInstruction i
                                                                  when i.owner().asInternalName().equals("Foo")) ->
                                                              codeBuilder.invokeInstruction(i.opcode(), 
                                                                                            ClassDesc.of("Bar"),
                                                                                            i.name(), i.type());
                                                              default -> codeBuilder.with(e);
                                                      }
                                                  }
                                              });
                                          }
                                          else
                                              methodBuilder.with(me);
                                      }
                                  });
                      }
                      else
                          classBuilder.with(ce);
                  }
              });

      Navigating the class-file tree by exploding entities into elements and examining each element involves some boilerplate which is repeated at multiple levels. This idiom is common to all traversals, so it is something the library should help with. The common pattern of taking a class-file entity, obtaining a corresponding builder, examining each element of the entity and possibly replacing it with other elements can be expressed by transforms, which are applied by transformation methods.

      A transform accepts a builder and an element. It either replaces the element with other elements, drops the element, or passes the element through to the builder. Transforms are functional interfaces, so transformation logic can be captured in lambdas.

      A transformation method copies the relevant metadata (names, flags, etc.) from a composite element to a builder and then processes the composite's elements by applying a transform, handling the repetitive exploding and iteration.

      Using transformation we can rewrite the previous example as:

      byte[] newBytes = classModel.transform((classBuilder, ce) -> {
          if (ce instanceof MethodModel mm) {
              classBuilder.transformMethod(mm, (methodBuilder, me)-> {
                  if (me instanceof CodeModel cm) {
                      methodBuilder.transformCode(cm, (codeBuilder, e) -> {
                          switch (e) {
                              case InvokeInstruction i
                                      when i.owner().asInternalName().equals("Foo") ->
                                  codeBuilder.invokeInstruction(i.opcode(), ClassDesc.of("Bar"), 
                                                                i.name().stringValue(),
                                                                i.typeSymbol(), i.isInterface());
                                  default -> codeBuilder.with(e);
                          }
                      });
                  }
                  else
                      methodBuilder.with(me);
              });
          }
          else
              classBuilder.with(ce);
      });

      Now the library is managing the iteration boilerplate, but the deep nesting of lambdas just to get access to the instructions is still somewhat intimidating. We can simplify this by factoring out the instruction-specific activity into a CodeTransform:

      CodeTransform codeTransform = (codeBuilder, e) -> {
          switch (e) {
              case InvokeInstruction i
                      when i.owner().asInternalName().equals("Foo") ->
                  codeBuilder.invokeInstruction(i.opcode(), ClassDesc.of("Bar"),
                                                i.name().stringValue(),
                                                i.typeSymbol(), i.isInterface());
                  default -> codeBuilder.accept(e);
          }
      };

      We can then lift this transform on code elements into a transform on method elements. When the lifted transform sees a Code attribute, it transforms it with the code transform, passing all other method elements through unchanged:

      MethodTransform methodTransform = MethodTransform.transformingCode(codeTransform);

      We can do the same again to lift the resulting transform on method elements into a transform on class elements:

      ClassTransform classTransform = ClassTransform.transformingMethods(methodTransform);

      Now our example becomes just:

      byte[] newBytes = ClassModel.of(bytes).transform(classTransform);

      Testing

      As this library has a large surface area and must generate classes in conformance with the Java Virtual Machine Specification, significant quality and conformance testing will be required. Further, to the degree that we replace existing uses of ASM with the new library we will compare the results of using both libraries to detect regressions, and do extensive performance testing to detect and avoid performance regressions.

      Attachments

        Issue Links

          Activity

            People

              briangoetz Brian Goetz
              briangoetz Brian Goetz
              Brian Goetz Brian Goetz
              Paul Sandoz
              Votes:
              3 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated: