Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8301971

Make JDK source code UTF-8



    • Enhancement
    • Resolution: Unresolved
    • P3
    • tbd
    • 21
    • infrastructure


      Currently, the source code in the JDK is in an ill-defined encoding. There is no official declaration of the encoding used. It is "mostly ASCII", but the relatively few non-ASCII characters used are not well-defined. In many cases, it is latin-1, but I am pretty certain other encodings are used for e.g. Asian translations.

      This is is creating unnecessary problems when working with the JDK code base, for no reason other than historical baggage.

      As JEP 400 (https://openjdk.org/jeps/400) confirms, UTF-8 is the way to go. We should follow up on this by converting our code base to UTF-8.

      This includes basically the following steps:
      * Tell git that the text files are encoded in UTF-8
      * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already
      * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags).

      Possibly, we should also:
      * Update jcheck to verify that changes do not contain invalid UTF-8 encodings.


        Issue Links



              Unassigned Unassigned
              ihse Magnus Ihse Bursie
              1 Vote for this issue
              3 Start watching this issue