ADDITIONAL SYSTEM INFORMATION :
OS: Windows 10 64 bit
I first found the error while using java an earlier release of 1.8, but have since updated my jdk and jre to release 1.8.0_172 and the bug persists.
I'm using Eclipse (Oxygen version 4.7.2)
A DESCRIPTION OF THE PROBLEM :
This code illustrates how DataInputStream.read(byte[]) fails to properly read zipped entries while DataInputStream.read() (in a loop) succeeds. A link to the file that the code operates on is provided.
ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
entry: tl_2017_09_tabblock10.dbf
I pasted in source code below that should reproduce the bug. Note that path name will need to be altered to suit local environment. Also note that if another file is used, the the variable "fileSize" is hardcoded and would need to be altered to suit as well.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Download this file from Census
ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
2. unpack the zipped archive
3. In the code (provided below), alter the following line to reflect the location of the compressed and uncompressed files you just downloaded.
String zipFilePath ="\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10.zip";
String uncompressedFilePath = "\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10\\tl_2017_09_tabblock10.dbf";
4. compile and run. ( I did this from within Eclipse)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Comparing dis.read(byte[]) with dis.read() with Zippedfile
num read: 7096203
total agreement: true
Comparing dis.read(byte[]) with dis.read() with Uncompressedfile
num read: 7096203
total agreement: true
ACTUAL -
Comparing dis.read(byte[]) with dis.read() with Zippedfile
num read: 30199
Disagreement at : 30199
allBytesAtOnce[30199]: 0
allBytesOneAtATime[30199]: 51
total agreement: false
Comparing dis.read(byte[]) with dis.read() with Uncompressedfile
num read: 7096203
total agreement: true
---------- BEGIN SOURCE ----------
package esri;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipFile;
public class DISBugReport {
/**
* This method is written to illustrate a bug
* in which DataInputStream.read(byte[]) fails while DataInputStream.read()
* succeeds in reading the same file.
*
* The failure only seems to happen when the DataInputStream is derived
* from from a zipped archive entry. To illustrate this, t
* the argument to the method determine whether a zipped entry generates
* the input stream or a regular file (FileInputStream) does.
*
* @param fromZipped if set to true the DataInputStream will sit on top of a zipped entry
* @throws IOException
*/
public static void checkBytes(boolean fromZipped) throws IOException{
System.out.println("\nComparing dis.read(byte[]) with dis.read() with "+ (fromZipped?"Zipped":"Uncompressed") +"file");
// Path and file size information for a file that reveals the bug
// to test on another file replace directory and fileName
// and make sure fileSize is equal to at most the size (measured in bytes)
// of the file to be tested in bytes.
// publicly available file from the Census here:
// ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
String entryName = "tl_2017_09_tabblock10.dbf"; // publicly available file from the Census
String zipFilePath ="\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10.zip";
int fileSize = 7096203;
String uncompressedFilePath = "\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10\\tl_2017_09_tabblock10.dbf";
// get data input stream either from zipped or uncompressed version
// of the same file depending on "fromZipped" flag.
// Note bug is ONLY revealed on data stream
// derived from a zipped entry.
DataInputStream dis;
if(fromZipped) dis = getDataStreamFromZippedEntry(zipFilePath, entryName);
else dis = new DataInputStream(new FileInputStream(uncompressedFilePath));
// we will attempt to populate two byte[] arrays
// with the entire contents of the file
// the first ("allBytesAtOnce") will be populated using DataInputStream.read(byte[])
// the second ("allBytesOneAtATime") will be populated using DataInputStream.read()
// USING DataInputStream.read(byte[])
// download the whole file into a byte array
// and close streams
byte[] allBytesAtOnce = new byte[fileSize];
int numRead = dis.read(allBytesAtOnce);
System.out.println("num read: " + numRead);
dis.close();
// USING DataInputStream.read()
// do it again, reading one byte at a time
if(fromZipped) dis = getDataStreamFromZippedEntry(zipFilePath, entryName);
else dis = new DataInputStream(new FileInputStream(uncompressedFilePath));
byte[] allBytesOneAtATime = new byte[fileSize];
for(int i = 0; i < fileSize; i++) {
allBytesOneAtATime[i] = (byte)dis.read();
}
dis.close();
// Compare the two byte arrays:
// allBytesAtOnce was populated with a call to DataInputStream.read(byte[])
// allBytesOneAtATime was populated with successive calls to DataInputStream.read()
boolean totalAgreement = true;
for(int i = 0; i < fileSize; i++) {
if(allBytesAtOnce[i] != allBytesOneAtATime[i]) {
System.out.println("Disagreement at : " + i);
System.out.println("allBytesAtOnce["+i+"]: "+ allBytesAtOnce[i]);
System.out.println("allBytesOneAtATime["+i+"]: "+ allBytesOneAtATime[i]);
totalAgreement = false;
break;
}
}
System.out.println("total agreement: " + totalAgreement);
}
public static DataInputStream getDataStreamFromZippedEntry(String zipFilePath, String entryName) throws IOException {
// get data input stream from zip file
ZipFile zf = new ZipFile(zipFilePath);
InputStream is = zf.getInputStream(zf.getEntry(entryName));
return new DataInputStream(is);
}
public static void main(String[] args) throws IOException {
// in this test the DataInputStream will derive from a zipped file entry
checkBytes(true);
// in this test the DataInputStream will derive from an uncompressed file
checkBytes(false);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
use DataInputStream.read() in a loop instead of DataInputStream.read(byte[])
FREQUENCY : often
OS: Windows 10 64 bit
I first found the error while using java an earlier release of 1.8, but have since updated my jdk and jre to release 1.8.0_172 and the bug persists.
I'm using Eclipse (Oxygen version 4.7.2)
A DESCRIPTION OF THE PROBLEM :
This code illustrates how DataInputStream.read(byte[]) fails to properly read zipped entries while DataInputStream.read() (in a loop) succeeds. A link to the file that the code operates on is provided.
ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
entry: tl_2017_09_tabblock10.dbf
I pasted in source code below that should reproduce the bug. Note that path name will need to be altered to suit local environment. Also note that if another file is used, the the variable "fileSize" is hardcoded and would need to be altered to suit as well.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Download this file from Census
ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
2. unpack the zipped archive
3. In the code (provided below), alter the following line to reflect the location of the compressed and uncompressed files you just downloaded.
String zipFilePath ="\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10.zip";
String uncompressedFilePath = "\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10\\tl_2017_09_tabblock10.dbf";
4. compile and run. ( I did this from within Eclipse)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Comparing dis.read(byte[]) with dis.read() with Zippedfile
num read: 7096203
total agreement: true
Comparing dis.read(byte[]) with dis.read() with Uncompressedfile
num read: 7096203
total agreement: true
ACTUAL -
Comparing dis.read(byte[]) with dis.read() with Zippedfile
num read: 30199
Disagreement at : 30199
allBytesAtOnce[30199]: 0
allBytesOneAtATime[30199]: 51
total agreement: false
Comparing dis.read(byte[]) with dis.read() with Uncompressedfile
num read: 7096203
total agreement: true
---------- BEGIN SOURCE ----------
package esri;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipFile;
public class DISBugReport {
/**
* This method is written to illustrate a bug
* in which DataInputStream.read(byte[]) fails while DataInputStream.read()
* succeeds in reading the same file.
*
* The failure only seems to happen when the DataInputStream is derived
* from from a zipped archive entry. To illustrate this, t
* the argument to the method determine whether a zipped entry generates
* the input stream or a regular file (FileInputStream) does.
*
* @param fromZipped if set to true the DataInputStream will sit on top of a zipped entry
* @throws IOException
*/
public static void checkBytes(boolean fromZipped) throws IOException{
System.out.println("\nComparing dis.read(byte[]) with dis.read() with "+ (fromZipped?"Zipped":"Uncompressed") +"file");
// Path and file size information for a file that reveals the bug
// to test on another file replace directory and fileName
// and make sure fileSize is equal to at most the size (measured in bytes)
// of the file to be tested in bytes.
// publicly available file from the Census here:
// ftp://ftp2.census.gov/geo/tiger/TIGER2017/TABBLOCK/tl_2017_09_tabblock10.zip
String entryName = "tl_2017_09_tabblock10.dbf"; // publicly available file from the Census
String zipFilePath ="\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10.zip";
int fileSize = 7096203;
String uncompressedFilePath = "\\\\mfs2\\GuseJ-Research\\ESRI Block Data\\tl_2017_09_tabblock10\\tl_2017_09_tabblock10.dbf";
// get data input stream either from zipped or uncompressed version
// of the same file depending on "fromZipped" flag.
// Note bug is ONLY revealed on data stream
// derived from a zipped entry.
DataInputStream dis;
if(fromZipped) dis = getDataStreamFromZippedEntry(zipFilePath, entryName);
else dis = new DataInputStream(new FileInputStream(uncompressedFilePath));
// we will attempt to populate two byte[] arrays
// with the entire contents of the file
// the first ("allBytesAtOnce") will be populated using DataInputStream.read(byte[])
// the second ("allBytesOneAtATime") will be populated using DataInputStream.read()
// USING DataInputStream.read(byte[])
// download the whole file into a byte array
// and close streams
byte[] allBytesAtOnce = new byte[fileSize];
int numRead = dis.read(allBytesAtOnce);
System.out.println("num read: " + numRead);
dis.close();
// USING DataInputStream.read()
// do it again, reading one byte at a time
if(fromZipped) dis = getDataStreamFromZippedEntry(zipFilePath, entryName);
else dis = new DataInputStream(new FileInputStream(uncompressedFilePath));
byte[] allBytesOneAtATime = new byte[fileSize];
for(int i = 0; i < fileSize; i++) {
allBytesOneAtATime[i] = (byte)dis.read();
}
dis.close();
// Compare the two byte arrays:
// allBytesAtOnce was populated with a call to DataInputStream.read(byte[])
// allBytesOneAtATime was populated with successive calls to DataInputStream.read()
boolean totalAgreement = true;
for(int i = 0; i < fileSize; i++) {
if(allBytesAtOnce[i] != allBytesOneAtATime[i]) {
System.out.println("Disagreement at : " + i);
System.out.println("allBytesAtOnce["+i+"]: "+ allBytesAtOnce[i]);
System.out.println("allBytesOneAtATime["+i+"]: "+ allBytesOneAtATime[i]);
totalAgreement = false;
break;
}
}
System.out.println("total agreement: " + totalAgreement);
}
public static DataInputStream getDataStreamFromZippedEntry(String zipFilePath, String entryName) throws IOException {
// get data input stream from zip file
ZipFile zf = new ZipFile(zipFilePath);
InputStream is = zf.getInputStream(zf.getEntry(entryName));
return new DataInputStream(is);
}
public static void main(String[] args) throws IOException {
// in this test the DataInputStream will derive from a zipped file entry
checkBytes(true);
// in this test the DataInputStream will derive from an uncompressed file
checkBytes(false);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
use DataInputStream.read() in a loop instead of DataInputStream.read(byte[])
FREQUENCY : often