Name: jl125535 Date: 07/31/2002
FULL PRODUCT VERSION :
java version "1.3.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_01)
Java HotSpot(TM) Client VM (build 1.3.1_01, mixed mode)
Similar behavior on 1.4.1-beta.
FULL OPERATING SYSTEM VERSION :
Red Hat Linux release 7.1 (Seawolf)
- Linux xxxxx.xxxxxx.defence.gov.au 2.4.2-2 #1 Sun Apr 8
20:41:30 EDT 2001 i686 unknown
- glibc-2.2.2-10
ADDITIONAL OPERATING SYSTEMS :
also tested on Win98SE
EXTRA RELEVANT SYSTEM CONFIGURATION :
tested on two filesystem types - one as FAT, one as Ext2
A DESCRIPTION OF THE PROBLEM :
I am processing a directory with a large number of files and
need to react based on whether each directory entry is a
normal file or a subdirectory. Given a large directory -
say, tens of thousands of entries - the performance of
File.isDirectory() slowly degrades as files are processed.
There are (at least) a couple of old bugs regarding this issue. Both were closed as "not reproducible".
- 4145781 java.io.File.isDirectory slows down 30X with many files (win95)
- 4071318 java.io.File.isDirectory on many files gradually slows down (win32)
I have isolated the problem and include a minimal test case
with which to reproduce the performance drop.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a directory with a large number of files. My
original test directory had 20,000 entries. The files may
be of zero length.
2. Save the included source as FileTest.java and compile
FileTest.class.
3. Run java FileTest [directory] from the command line.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected results:
The command prompt is followed by a long sequence of
numbers. These represent the number of files that have been
processed.
Since the files to be processed are stored in an array,
which can be accessed in constant time, and determining
whether a directory entry is a subdirectory or not should
also take O(1) time, the numbers should appear at a contant
rate. (Indeed, if the isDirectory() line is commented out
of the test program, the numbers do appear at a constant rate.)
Actual results (RH7.1, FAT filesystem):
I set up a test case on an FAT partition: a single
directory with about 8900 files. Running the program on this directory in the
mounted partition gave the following output, which indicates that each
instalment of 500 files to test took about half a second (reported as 500
milliseconds in the output) longer than the previous instalment:
0 0
500 208
1000 563
1500 1003
2000 1255
2500 1665
3000 2086
3500 2579
4000 3175
4500 3700
5000 4197
5500 4693
6000 5226
6500 5697
7000 6153
7500 6697
8000 7306
8500 7873
Running the test again immediately afterwards, I got vastly different results -
but the results I expected to get from the beginning nonetheless. This is
probably due to the caching done while the first run was proceeding:
0 0
500 22
1000 20
1500 71
2000 13
2500 13
3000 12
3500 13
4000 13
4500 13
5000 13
5500 14
6000 13
6500 13
7000 13
7500 13
8000 52
8500 13
Going back to my original test case - a directory of 20,000 entries, this time
on an Ext2 partition - I wanted to make sure that the same thing was still
happening there as well. It was, but to a much lesser extent:
0 0
500 28
1000 36
1500 45
2000 35
2500 40
3000 45
3500 44
4000 52
4500 60
5000 71
5500 78
6000 85
6500 92
7000 89
7500 99
8000 131
8500 171
9000 253
9500 295
10000 340
10500 409
11000 452
11500 548
12000 550
12500 589
13000 626
13500 723
14000 698
14500 755
15000 747
15500 781
16000 851
16500 831
17000 894
17500 881
18000 942
18500 934
19000 926
19500 994
Immediately repeating this second run, as it did on the FAT partition, gave
nearly instantaneous results.
At your advice, I have downloaded the J2RE 1.4.1 beta and tried these tests
again. I got the same results every time, to within a couple of hundredths of a
second.
However, I got the best indication I got of what was really happening by
reversing the order of the loop in my code - that is, by iterating backwards
from the end of the array. The results (from the FAT partition again) were very
interesting:
19500 1041
19000 1024
18500 1062
18000 961
17500 978
17000 936
16500 921
16000 865
15500 928
15000 834
14500 829
14000 744
13500 746
13000 742
12500 657
12000 649
11500 566
11000 538
10500 468
10000 452
9500 433
9000 376
8500 338
8000 285
7500 190
7000 189
6500 166
6000 177
5500 185
5000 137
4500 105
4000 96
3500 93
3000 96
2500 95
2000 93
1500 13
1000 10
500 10
0 11
Obviously, there is something happening in the File.isDirectory() call somewhere
that is doing a long look-up. By "long", I mean that it takes an amount of time
that is proportional to the position of the file in the directory listing, which
implies that it is linear in the number of entries in the directory.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
no errors reported
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.File;
public class FileTest {
public static void main (String[] args) {
File dir = new File (args[0]);
File[] files = dir.listFiles();
long timer = System.currentTimeMillis();
for (int i=0; i < files.length; i++) {
files[i].isDirectory();
if (i % 500 == 0) {
// report time for the last 500 files
timer = System.currentTimeMillis() - timer;
System.out.println (i + " " + timer);
// reset the timer
timer = System.currentTimeMillis();
}
}
}
}
---------- END SOURCE ----------
(Review ID: 159351)
======================================================================