-
Bug
-
Resolution: Unresolved
-
P4
-
25
-
None
-
In Review
After we converted the source base to be fully UTF-8, we do not need to use unicode sequences (like \u0123) in string literals, but are free to replace them with real UTF-8 characters.
Whether that makes sense or not depends very much on the actual circumstatnces. In contrast to the sibling patchJDK-8356978 (Convert unicode sequences in Java source code to UTF-8) which deals with the `src` directory, and where basically all sequences made sense to convert, the situation for the `test` directory is very different.
First of all, there are a lot more non-ASCII Unicode characters, due to the need to be able to test with these kinds of characters. Secondly, in many cases the unicode characters are contrived and supposed to provoke a specific behavior, rather than to be readable text.
I did an automatic conversion of all unicode characters to UTF-8 in the test files, and then I went through the result and immediately reverted most of the changes. If at first glance something did not make sense, it was reverted without pardon. I then made several passes at the remaining files. In the end, I kept those changes where I believe the readability of the test is improved by having real UTF-8 characters rather than abstract unicode sequences. Since this is a judgement call, opinions may vary.
Whether that makes sense or not depends very much on the actual circumstatnces. In contrast to the sibling patch
First of all, there are a lot more non-ASCII Unicode characters, due to the need to be able to test with these kinds of characters. Secondly, in many cases the unicode characters are contrived and supposed to provoke a specific behavior, rather than to be readable text.
I did an automatic conversion of all unicode characters to UTF-8 in the test files, and then I went through the result and immediately reverted most of the changes. If at first glance something did not make sense, it was reverted without pardon. I then made several passes at the remaining files. In the end, I kept those changes where I believe the readability of the test is improved by having real UTF-8 characters rather than abstract unicode sequences. Since this is a judgement call, opinions may vary.