ADDITIONAL SYSTEM INFORMATION :
Windows Server 2016 providing network shares, Windows 7 running the actual daemon, OpenJDK 1.8.0_232 used to run the daemon.
A DESCRIPTION OF THE PROBLEM :
Some daemon implemented in Java, running on Windows 7, copies files from one directory into another, while both source and target directory are a network share hosted by Windows Server 2016. Copying is done using Apache Commons IO and occasionally it happens that this process fails with the following stacktrace and a message reading somewhat like "no more files":
> java.io.IOException: Es sind keine weiteren Dateien vorhanden
> at java.io.WinNTFileSystem.canonicalize0(Native Method)
> at java.io.WinNTFileSystem.canonicalize(Unknown Source)
> at java.io.File.getCanonicalPath(Unknown Source)
> at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:642)
> at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:587)
> at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:558)
> at de.am_soft.osgi.dokliste.eingaenge.impl.internal.Eingang.copyFilesToDbxmlFolders(Eingang.java:283)
Apache Commons IO uses the following code at that line and the line really only is the following "if", not the exception:
> if (srcFile.getCanonicalPath().equals(destFile.getCanonicalPath())) {
> throw new IOException("Source '" + srcFile + "' and destination '" + destFile + "' are the same");
> }
So the problem is not with copying itself, but with generating canonical paths already. Using Process Monitor[1] at the client where the daemon runs proves that as well. The following is the last event before the daemon clearly logs the above exception, tries to send error mails using Logback and stuff. The result of that event (NO MORE FILES) perfectly well fits to the error message of the stacktrace:
> 10:12:06,6244515 integration.exe 6928 QueryDirectory \\HOST\SHARE$\DocBeam3\[...].zip NO MORE FILES Filter: 20191106-081920-[...].zip
Additionally, looking at former lines of ProcMon, it's sure that the exception happens for "destFile" only. Executing the daemon on my local machine instead leads to the following logged event (NO SUCH FILE) always:
> 19:08:03,7485947 java.exe 6232 QueryDirectory C:\Users\[...].zip NO SUCH FILE Filter: 20191022-143101-[...].zip
I've debugged the native methods and came across "lastErrorReportable", which explicitly checks for some special error codes and doesn't contain ERROR_NO_MORE_FILES from the first event, while it does contain ERROR_FILE_NOT_FOUND from the second one:
> if ((errval == ERROR_FILE_NOT_FOUND)
> || (errval == ERROR_DIRECTORY)
> || (errval == ERROR_PATH_NOT_FOUND)
> || (errval == ERROR_BAD_NETPATH)
> || (errval == ERROR_BAD_NET_NAME)
> || (errval == ERROR_ACCESS_DENIED)
> || (errval == ERROR_NETWORK_UNREACHABLE)
> || (errval == ERROR_NETWORK_ACCESS_DENIED)) {
> return 0;
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/canonicalize_md.c#L131
So it seems like whenever ERROR_NO_MORE_FILES occurs, canonicalizing a path simply gets aborted with an error instead of ignoring it like for the other errors:
> if (!lastErrorReportable()) {
> if (!(dst = wcp(dst, dend, L'\0', src, src + wcslen(src)))){
> goto err;
> }
> break;
> } else {
> goto err;
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/canonicalize_md.c#L246
The thrown exception fits pretty well to what I get, with the given message only being a fallback not used in my case:
> if (rv == NULL && !(*env)->ExceptionCheck(env)) {
> JNU_ThrowIOExceptionWithLastError(env, "Bad pathname");
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/WinNTFileSystem_md.c#L258
The interesting thing now is that the daemon doesn't fail always on each and every file copy, but only sometimes, somewhat rarely. But if it fails it seems to have to do with other directories and files being available in the target directory already. While those are completely unrelated to the daemon and according to ProcMon those don't get iterated or stuff, their pure existance seems to make a difference already. If I simply delete all of those files and directories and empty the target directory this way, copying instantly succeeds again. That's interesting because having files and directories in the target directory in my local setup doesn't seem to have any influence: Copying never fails and especially the event logged by ProcMon NEVER is ERROR_NO_MORE_FILES as well. After emptying the directory on the setup where the problem happens, ProcMon logs ERROR_FILE_NOT_FOUND again as well.
So it seems that for some reason under some currently unknown circumstances, Windows decides to use ERROR_NO_MORE_FILES as last error in the calls to "FindFirstFileW" used by "wcanonicalize". Because Java doesn't have that on its exception list, copying fails in those curcumstances, even if it seems to be a perfectly valid situation. I don't see any real error otherwise.
So how about adding ERROR_NO_MORE_FILES to "lastErrorReportable"?
[1]: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Can't reliably get Windows to use ERROR_NO_MORE_FILES.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Canonicalizing a path shouldn't fail in case of ERROR_NO_MORE_FAILS, Java should be as tolerant as it is with ERROR_FILE_NOT_FOUND etc.
ACTUAL -
Canonicalizing a path fails, leading to IOException, files don't get copied.
CUSTOMER SUBMITTED WORKAROUND :
Don't know the root cause yet, but in my environment emptying the destination folder of the copy operation reliably makes Windows not use ERROR_NO_MORE_FILES anymore for some reason.
FREQUENCY : occasionally
Windows Server 2016 providing network shares, Windows 7 running the actual daemon, OpenJDK 1.8.0_232 used to run the daemon.
A DESCRIPTION OF THE PROBLEM :
Some daemon implemented in Java, running on Windows 7, copies files from one directory into another, while both source and target directory are a network share hosted by Windows Server 2016. Copying is done using Apache Commons IO and occasionally it happens that this process fails with the following stacktrace and a message reading somewhat like "no more files":
> java.io.IOException: Es sind keine weiteren Dateien vorhanden
> at java.io.WinNTFileSystem.canonicalize0(Native Method)
> at java.io.WinNTFileSystem.canonicalize(Unknown Source)
> at java.io.File.getCanonicalPath(Unknown Source)
> at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:642)
> at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:587)
> at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:558)
> at de.am_soft.osgi.dokliste.eingaenge.impl.internal.Eingang.copyFilesToDbxmlFolders(Eingang.java:283)
Apache Commons IO uses the following code at that line and the line really only is the following "if", not the exception:
> if (srcFile.getCanonicalPath().equals(destFile.getCanonicalPath())) {
> throw new IOException("Source '" + srcFile + "' and destination '" + destFile + "' are the same");
> }
So the problem is not with copying itself, but with generating canonical paths already. Using Process Monitor[1] at the client where the daemon runs proves that as well. The following is the last event before the daemon clearly logs the above exception, tries to send error mails using Logback and stuff. The result of that event (NO MORE FILES) perfectly well fits to the error message of the stacktrace:
> 10:12:06,6244515 integration.exe 6928 QueryDirectory \\HOST\SHARE$\DocBeam3\[...].zip NO MORE FILES Filter: 20191106-081920-[...].zip
Additionally, looking at former lines of ProcMon, it's sure that the exception happens for "destFile" only. Executing the daemon on my local machine instead leads to the following logged event (NO SUCH FILE) always:
> 19:08:03,7485947 java.exe 6232 QueryDirectory C:\Users\[...].zip NO SUCH FILE Filter: 20191022-143101-[...].zip
I've debugged the native methods and came across "lastErrorReportable", which explicitly checks for some special error codes and doesn't contain ERROR_NO_MORE_FILES from the first event, while it does contain ERROR_FILE_NOT_FOUND from the second one:
> if ((errval == ERROR_FILE_NOT_FOUND)
> || (errval == ERROR_DIRECTORY)
> || (errval == ERROR_PATH_NOT_FOUND)
> || (errval == ERROR_BAD_NETPATH)
> || (errval == ERROR_BAD_NET_NAME)
> || (errval == ERROR_ACCESS_DENIED)
> || (errval == ERROR_NETWORK_UNREACHABLE)
> || (errval == ERROR_NETWORK_ACCESS_DENIED)) {
> return 0;
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/canonicalize_md.c#L131
So it seems like whenever ERROR_NO_MORE_FILES occurs, canonicalizing a path simply gets aborted with an error instead of ignoring it like for the other errors:
> if (!lastErrorReportable()) {
> if (!(dst = wcp(dst, dend, L'\0', src, src + wcslen(src)))){
> goto err;
> }
> break;
> } else {
> goto err;
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/canonicalize_md.c#L246
The thrown exception fits pretty well to what I get, with the given message only being a fallback not used in my case:
> if (rv == NULL && !(*env)->ExceptionCheck(env)) {
> JNU_ThrowIOExceptionWithLastError(env, "Bad pathname");
> }
https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/WinNTFileSystem_md.c#L258
The interesting thing now is that the daemon doesn't fail always on each and every file copy, but only sometimes, somewhat rarely. But if it fails it seems to have to do with other directories and files being available in the target directory already. While those are completely unrelated to the daemon and according to ProcMon those don't get iterated or stuff, their pure existance seems to make a difference already. If I simply delete all of those files and directories and empty the target directory this way, copying instantly succeeds again. That's interesting because having files and directories in the target directory in my local setup doesn't seem to have any influence: Copying never fails and especially the event logged by ProcMon NEVER is ERROR_NO_MORE_FILES as well. After emptying the directory on the setup where the problem happens, ProcMon logs ERROR_FILE_NOT_FOUND again as well.
So it seems that for some reason under some currently unknown circumstances, Windows decides to use ERROR_NO_MORE_FILES as last error in the calls to "FindFirstFileW" used by "wcanonicalize". Because Java doesn't have that on its exception list, copying fails in those curcumstances, even if it seems to be a perfectly valid situation. I don't see any real error otherwise.
So how about adding ERROR_NO_MORE_FILES to "lastErrorReportable"?
[1]: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Can't reliably get Windows to use ERROR_NO_MORE_FILES.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Canonicalizing a path shouldn't fail in case of ERROR_NO_MORE_FAILS, Java should be as tolerant as it is with ERROR_FILE_NOT_FOUND etc.
ACTUAL -
Canonicalizing a path fails, leading to IOException, files don't get copied.
CUSTOMER SUBMITTED WORKAROUND :
Don't know the root cause yet, but in my environment emptying the destination folder of the copy operation reliably makes Windows not use ERROR_NO_MORE_FILES anymore for some reason.
FREQUENCY : occasionally
- duplicates
-
JDK-8234363 ERROR_NO_MORE_FILES is not handled in WinNTFileSystem.canonicalize0
- Closed