Name: boT120536 Date: 01/21/2001
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)
The sun.net.www.protocol.http.HttpURLConnection class fails to correctly allow
access to some valid URLs. The failure occurs under Solaris 2.6. It occurs
both in JDK 1.3.0 and also JDK 1.2.2. It also occurs under MacOS X beta (which
I realize it not supported by Sun). It does NOT fail under Linux.
The URL in question is one that is not at our site, but I include a test
function that should illustrate the failure. The test code below will
successfully access the first 3 URLs:
http://www.isi.edu
http://citeseer.nj.nec.com/
http://citeseer.nj.nec.com/correct/
http://citeseer.nj.nec.com/correct/163004
http://citeseer.nj.nec.com/correct/163004/
and then fail on the fourth and fifth.
All five URLs can be successfully read via the Netscape browser.
All five URLs can be successfully read, and return response code 200 when
a telnet connection is made to port 80 on the host and the request sent manually
in either HTTP/1.0 or HTTP/1.1 format.
Source code to demonstrate the problem:
import java.net.*;
import java.util.*;
import java.io.*;
public class HttpTest {
public static void getPage(String urlString) throws IOException {
// Retrieve the page at `urlString' and print the first 500 bytes.
URL url = new URL(urlString);
InputStream pageStream;
int ch;
int count = 0;
HttpURLConnection connection = null;
if(url.getProtocol().equalsIgnoreCase("http")){
try {
System.out.println("==== RETRIEVING " + url + " ====");
System.out.println();
connection = (HttpURLConnection) url.openConnection();
System.err.println("HttpURLConnection opened: " + connection);
pageStream = connection.getInputStream();
System.err.println("HttpURLConnection input Stream: " + pageStream);
System.err.print("Response: ");
System.err.print(connection.getResponseCode());
System.err.println(" " + connection.getResponseMessage());
System.out.println();
System.out.println("==== CONTENT ====");
System.out.println();
for (ch = pageStream.read() ; ch !=-1 ; ch = pageStream.read()) {
if (++count < 500) {
System.out.write(ch);
} else if (count == 500) {
System.out.println();
System.out.println("<More...>");
}
}
System.out.println();
System.out.println("==== DONE " + count + " bytes ====");
System.out.println();
} catch (Exception e) {
System.err.println();
System.err.println("*** ERROR: " + e);
e.printStackTrace();
System.err.println();
}
}
}
public static void main(String args[]) {
// Run a loop with test URLs via the Java http URL support.
// All of these urls work from a browser.
// All of them work from Linux
// All of them work manually telnetting to port 80
// and issuing a "GET <url> HTTP/1.0" command.
//
// Two different "failures" occur in other systems:
// The latter two fail on Solaris, jdk 1.3.0 and jdk 1.2.2
// The latter two fail on MacOS X jdk 1.2.2
String[] urls
= new String [] {"http://www.isi.edu",
"http://citeseer.nj.nec.com/",
"http://citeseer.nj.nec.com/correct/",
"http://citeseer.nj.nec.com/correct/163004", // Fails with error
"http://citeseer.nj.nec.com/correct/163004/" // Fails with busy
};
for (int i = 0; i < urls.length; i++) {
try {
getPage(urls[i]);
} catch (Exception e) {
System.err.println();
System.err.println("**** Error: " + e);
e.printStackTrace();
System.err.println();
}
}
}
}
Sample Trace of the program running on our system:
==== RETRIEVING http://www.isi.edu ====
HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://www.isi.edu
HttpURLConnection input Stream: www.http.KeepAliveStream@4b222f'>sun.net.www.http.KeepAliveStream@4b222f
Response: 200 OK
==== CONTENT ====
<HTML>
<HEAD><TITLE>USC Information Sciences
Institute</TITLE></HEAD>
<BODY BACKGROUND="images/bg-nologo.jpg"
TEXT="#000000" LINK="#AA0000" VLINK="#111111">
<MAP NAME="ISI">
<AREA SHAPE=rect HREF="http://www.isi.edu/about.html"
COORDS="18,112,122,151">
<AREA SHAPE=rect
HREF="http://www.isi.edu/publications.html" COORDS="16,154,121,192">
<AREA SHAPE=rect HREF="http://www.isi.edu/servicelist.html"
COORDS="17,195,120,233">
<AREA SHAPE=rect
HREF="http://www.isi.edu/divisions/main/index.
<More...>
==== DONE 3120 bytes ====
==== RETRIEVING http://citeseer.nj.nec.com/ ====
HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/
HttpURLConnection input Stream: www.http.KeepAliveStream@2125f0'>sun.net.www.http.KeepAliveStream@2125f0
Response: 200 OK
==== CONTENT ====
<html><head><TITLE>ResearchIndex: The NECI Scientific Literature Digital Library
[Steve Lawrence, Kurt Bollacker, Lee Giles, NEC Research Institute]</TITLE>
<!70>
<META name="description" content="ResearchIndex (formerly CiteSeer): The NECI
Scientific Literature Digital Library. Autonomously creates citation indexes of
scientific literature. Advantages in terms of availability, coverage,
timeliness, and efficiency. Generates citation statistics and allows easy
browsing of the context of citati
<More...>
==== DONE 11077 bytes ====
==== RETRIEVING http://citeseer.nj.nec.com/correct/ ====
HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/
HttpURLConnection input Stream: www.http.KeepAliveStream@41cd1f'>sun.net.www.http.KeepAliveStream@41cd1f
Response: 200 OK
==== CONTENT ====
<html><head><TITLE>ResearchIndex: The NECI Scientific Literature Digital Library
[Steve Lawrence, Kurt Bollacker, Lee Giles, NEC Research Institute]</TITLE>
<!9>
<META name="description" content="ResearchIndex (formerly CiteSeer): The NECI
Scientific Literature Digital Library. Autonomously creates citation indexes of
scientific literature. Advantages in terms of availability, coverage,
timeliness, and efficiency. Generates citation statistics and allows easy
browsing of the context of citation
<More...>
==== DONE 11097 bytes ====
==== RETRIEVING http://citeseer.nj.nec.com/correct/163004 ====
HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/163004
*** ERROR: java.io.FileNotFoundException:
http://citeseer.nj.nec.com/correct/163004
java.io.FileNotFoundException: http://citeseer.nj.nec.com/correct/163004
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:545)
at ir_tools.HttpTest2.getPage(HttpTest2.java:24)
at ir_tools.HttpTest2.main(HttpTest2.java:62)
==== RETRIEVING http://citeseer.nj.nec.com/correct/163004/ ====
HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/163004/
HttpURLConnection input Stream: www.MeteredStream@31f71a'>sun.net.www.MeteredStream@31f71a
Response: 503 System busy
==== CONTENT ====
<!DOCTYPE HTML
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="en-US"><HEAD><TITLE>ResearchIndex [NEC Research Institute; Steve
Lawrence, Kurt Bollacker, Lee Giles; Computer Science]</TITLE>
<LINK REV=MADE HREF="mailto:lawrence%40research.nj.nec.com">
<BASE HREF="http://citeseer.nj.nec.com/correct/163004/">
<META NAME="description" CONTENT="ResearchIndex (CiteSeer): Scientific
Literature Digital Library incorporating autonomous citation inde
<More...>
==== DONE 1530 bytes ====
(Review ID: 115411)
======================================================================
- duplicates
-
JDK-4160499 sun.net.www.protocol.http.HttpURLConnection error handling
- Resolved