Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8046141

JEP 151: Compress Time-Zone Data

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Withdrawn
    • Icon: P4 P4
    • None
    • core-libs
    • None
    • Stuart Marks, Darryl Mocek, Peter Jensen
    • Feature
    • Open
    • JDK
    • i18n dash dev at openjdk dot java dot net
    • S
    • 151

      Summary

      Store time-zone data more efficiently, in a single compressed file rather than in one uncompressed file per zone.

      Description

      The original reason to keep time-zone data in individual uncompressed files, rather than in a single compressed file, was (we surmise) to optimize access to the data for a particular time zone and to reduce dynamic memory consumption.

      Given that the data for a given zone is read only once, and most applications only use one or a few zones, this is probably not a major concern. It is entirely possible that the current implementation was simply more convenient, and that there was simply no requirement to justify the extra effort of using a compressed format.

      For a large number of files of random size, the amount of disk overhead is expected to be number-of-files * 0.5 * file-system-block-size.

      The block size on UNIX, Linux (including embedded Linuxes), and NTFS file systems is typically 4KB. There are 500+ time zone files, resulting in an expected overhead of about 1MB (in line with observations).

      On a system with a smaller block size of 1KB we would still expect to see an overhead of about 250KB, or about 100% of the actual file size.

      Options for reducing the dynamic footprint include:

      1. Store files in a zip/jar archive
      2. Use an embedded database

      For (1) very minimal and localized changes are required to implement reading zip-file entries rather than individual files.

      (2) requires a database. The performance characteristics of using a database are unknown. This may still be interesting if the future installed-module format already makes use of a database for efficient storage and access to items contained in a module.

      Testing

      Requires testing of the performance impact of retrieving time zone data, especially the first call to retrieve time zone data.

      Requires changing the testing of the upgrade tools to ensure the time zone data has been written out properly.

      Risks and Assumptions

      Time-zone updates using a compressed format will not apply to older JDKs. This might require some duplication of effort, to provide updates in two different formats.

      There is a risk of a decrease in performance when retrieving time-zone data, in particular for the first zone requested. In the case of a zip file, the performance decrease is expected to be small.

      Impact

      • Other JDK components: Time-zone and locale data-upgrade tools will have to be changed to handle the new format.

            smarks Stuart Marks
            smarks Stuart Marks
            Stuart Marks Stuart Marks
            Brian Goetz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: