Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4724022

File.isDirectory() performance degrades with many files in one directory

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P3 P3
    • None
    • 1.4.1
    • core-libs
    • x86
    • linux



      Name: jl125535 Date: 07/31/2002


      FULL PRODUCT VERSION :
      java version "1.3.1_01"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_01)
      Java HotSpot(TM) Client VM (build 1.3.1_01, mixed mode)
      Similar behavior on 1.4.1-beta.

      FULL OPERATING SYSTEM VERSION :
      Red Hat Linux release 7.1 (Seawolf)
       - Linux xxxxx.xxxxxx.defence.gov.au 2.4.2-2 #1 Sun Apr 8
      20:41:30 EDT 2001 i686 unknown
       - glibc-2.2.2-10


      ADDITIONAL OPERATING SYSTEMS :

      also tested on Win98SE


      EXTRA RELEVANT SYSTEM CONFIGURATION :
      tested on two filesystem types - one as FAT, one as Ext2


      A DESCRIPTION OF THE PROBLEM :
      I am processing a directory with a large number of files and
      need to react based on whether each directory entry is a
      normal file or a subdirectory. Given a large directory -
      say, tens of thousands of entries - the performance of
      File.isDirectory() slowly degrades as files are processed.

      There are (at least) a couple of old bugs regarding this issue. Both were closed as "not reproducible".
      - 4145781 java.io.File.isDirectory slows down 30X with many files (win95)
      - 4071318 java.io.File.isDirectory on many files gradually slows down (win32)

      I have isolated the problem and include a minimal test case
      with which to reproduce the performance drop.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Create a directory with a large number of files. My
      original test directory had 20,000 entries. The files may
      be of zero length.

      2. Save the included source as FileTest.java and compile
      FileTest.class.

      3. Run java FileTest [directory] from the command line.


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected results:

      The command prompt is followed by a long sequence of
      numbers. These represent the number of files that have been
      processed.

      Since the files to be processed are stored in an array,
      which can be accessed in constant time, and determining
      whether a directory entry is a subdirectory or not should
      also take O(1) time, the numbers should appear at a contant
      rate. (Indeed, if the isDirectory() line is commented out
      of the test program, the numbers do appear at a constant rate.)


      Actual results (RH7.1, FAT filesystem):

      I set up a test case on an FAT partition: a single
      directory with about 8900 files. Running the program on this directory in the
      mounted partition gave the following output, which indicates that each
      instalment of 500 files to test took about half a second (reported as 500
      milliseconds in the output) longer than the previous instalment:

      0 0
      500 208
      1000 563
      1500 1003
      2000 1255
      2500 1665
      3000 2086
      3500 2579
      4000 3175
      4500 3700
      5000 4197
      5500 4693
      6000 5226
      6500 5697
      7000 6153
      7500 6697
      8000 7306
      8500 7873

      Running the test again immediately afterwards, I got vastly different results -
      but the results I expected to get from the beginning nonetheless. This is
      probably due to the caching done while the first run was proceeding:

      0 0
      500 22
      1000 20
      1500 71
      2000 13
      2500 13
      3000 12
      3500 13
      4000 13
      4500 13
      5000 13
      5500 14
      6000 13
      6500 13
      7000 13
      7500 13
      8000 52
      8500 13

      Going back to my original test case - a directory of 20,000 entries, this time
      on an Ext2 partition - I wanted to make sure that the same thing was still
      happening there as well. It was, but to a much lesser extent:

      0 0
      500 28
      1000 36
      1500 45
      2000 35
      2500 40
      3000 45
      3500 44
      4000 52
      4500 60
      5000 71
      5500 78
      6000 85
      6500 92
      7000 89
      7500 99
      8000 131
      8500 171
      9000 253
      9500 295
      10000 340
      10500 409
      11000 452
      11500 548
      12000 550
      12500 589
      13000 626
      13500 723
      14000 698
      14500 755
      15000 747
      15500 781
      16000 851
      16500 831
      17000 894
      17500 881
      18000 942
      18500 934
      19000 926
      19500 994

      Immediately repeating this second run, as it did on the FAT partition, gave
      nearly instantaneous results.

      At your advice, I have downloaded the J2RE 1.4.1 beta and tried these tests
      again. I got the same results every time, to within a couple of hundredths of a
      second.

      However, I got the best indication I got of what was really happening by
      reversing the order of the loop in my code - that is, by iterating backwards
      from the end of the array. The results (from the FAT partition again) were very
      interesting:

      19500 1041
      19000 1024
      18500 1062
      18000 961
      17500 978
      17000 936
      16500 921
      16000 865
      15500 928
      15000 834
      14500 829
      14000 744
      13500 746
      13000 742
      12500 657
      12000 649
      11500 566
      11000 538
      10500 468
      10000 452
      9500 433
      9000 376
      8500 338
      8000 285
      7500 190
      7000 189
      6500 166
      6000 177
      5500 185
      5000 137
      4500 105
      4000 96
      3500 93
      3000 96
      2500 95
      2000 93
      1500 13
      1000 10
      500 10
      0 11

      Obviously, there is something happening in the File.isDirectory() call somewhere
      that is doing a long look-up. By "long", I mean that it takes an amount of time
      that is proportional to the position of the file in the directory listing, which
      implies that it is linear in the number of entries in the directory.


      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      no errors reported

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.File;

      public class FileTest {
          public static void main (String[] args) {

              File dir = new File (args[0]);
              File[] files = dir.listFiles();
              long timer = System.currentTimeMillis();

              for (int i=0; i < files.length; i++) {

              files[i].isDirectory();
              if (i % 500 == 0) {

                  // report time for the last 500 files
                  timer = System.currentTimeMillis() - timer;
                  System.out.println (i + " " + timer);

                  // reset the timer
                  timer = System.currentTimeMillis();
                  }
              }
          }
      }
      ---------- END SOURCE ----------
      (Review ID: 159351)
      ======================================================================

            mr Mark Reinhold
            jleesunw Jon Lee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: