-
Bug
-
Resolution: Fixed
-
P4
-
6
-
b96
-
generic
-
linux
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2140929 | 5.0-pool | Chris Phillips | P4 | Closed | Won't Fix |
There were complaints at a customer site that hotspot crashed while running in a chroot environment on linux. There are known bugs on linux likely to be fixed, but it was agreed at the Hotspot runtime meeting 5/3/06 that we should at least warn the user when we detect that this is present. The email thread from 4/14/06 extracted below describes the problem and possible solutions in detail.
--- ###@###.### writes:
Hello,
I only caught some of the presentation of the G-S guy at Laurie's all-hands before my webfeed cut out, but one thing he mentioned was a problem with the VM when running in a chroot environment without a proc filesystem on linux. Apparently, the mechanism we use to get the number of processors returns '1' in this case, which means the VM can optimize for one processor and not worry about some SMP issues. In this case, we still are running SMP and don't know it so problems arise.
Does anyone know if there open CR about this, or is anyone looking into it?
I did a quick test on my Redhat desktop machine (with just a simple C program) and see similar results: The sysconf(_SC_NPROCESSORS_CONF) call returns 1 when there is no /proc filesystem, even when running on a 4-way machine. (it returns the correct value when there is a /proc).
This looks to me to be a linux bug, though I suppose it might be in our interest to look into a workaround. Perhaps sysconf(_SC_NPROCESSORS_CONF) is not the best way to get the CPU count?
Then again, I suppose we could just make sure to tell people that if they run on linux in a chroot environment, they better have a /proc filesystem. Maybe we could just release note it or something.
Any ideas?
--- ###@###.### writes:
Hi Keith,
Toward the end of the G-S talk the presenter seemed to admit the problem
was actually a linux bug. Unfortunately the linux libraries still
extract information from /proc.
One thought would be to probe for /proc. If we can't access the usual
files we pessimally assume ncpus > 1.
Dave
--- ###@###.### writes:
I found this issue on Redhat's bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151852
Looks like it might be fixed in a newer version of chroot (9.3.1). Not sure if/when that's going into Redhat (I found it in a fedora rpm).
Guess I won't worry about it, then.
--- ###@###.### writes:
... at least for the purposes of LOCK: prefixes and membar/fence
insertion. For GC we might want to assume something different. If
/proc isn't visible we could also try to probe the number of CPUs with
sched_getaffinity() or sched_setaffinit(), where available. That's
sleazy and I'd advise against it, however.
This is a linux bug. I don't think we should try work-arounds, although
it's probably worth adding a release note.
Dave
--- nikolay@###@###.### writes:
We can be cool, and on x86 compute number of CPUs like my program from attachment, although I absolutely agree with Dave that it's Linux bug and should be fixed by them.
Nikolay.
--- ###@###.### writes:
I think that Paul's point on this issue is, gee, why is it that GS is finding this, and not Sun?
A release note is definitely warranted, but I don't have an issue with some mechanism (hack) to correct the incorrect data from the OS. Dave is right, this is definitely an OS bug, but that doesn't relieve us of the responsibility to manage it.
To a certain extent, this relates a bit to the relative success of Netscape in the early days of browser development: their policy was simple: "There is no such thing as bad HTML". And as a result their browser *always* rendered something reasonably well regardless of what was stuffed into the document. While it would be nice to just say, "Oh, its an OS problem", that's not really realistic if that is what customers are using.
(Later on though, Netscape did pay a price; it was almost impossible to migrate to a new, more rationale code base. We need to balance our decisions accordingly, and make sure we carefully control how we build in hacks for problems like this.)
And, FWIW, *in this case* falsely getting the processor count as non '1', is better than the reverse. (But I suspect there are other cases where the reverse is true, so we want to be careful about how we address this.)
-John
--- ###@###.### writes:
Hello,
I only caught some of the presentation of the G-S guy at Laurie's all-hands before my webfeed cut out, but one thing he mentioned was a problem with the VM when running in a chroot environment without a proc filesystem on linux. Apparently, the mechanism we use to get the number of processors returns '1' in this case, which means the VM can optimize for one processor and not worry about some SMP issues. In this case, we still are running SMP and don't know it so problems arise.
Does anyone know if there open CR about this, or is anyone looking into it?
I did a quick test on my Redhat desktop machine (with just a simple C program) and see similar results: The sysconf(_SC_NPROCESSORS_CONF) call returns 1 when there is no /proc filesystem, even when running on a 4-way machine. (it returns the correct value when there is a /proc).
This looks to me to be a linux bug, though I suppose it might be in our interest to look into a workaround. Perhaps sysconf(_SC_NPROCESSORS_CONF) is not the best way to get the CPU count?
Then again, I suppose we could just make sure to tell people that if they run on linux in a chroot environment, they better have a /proc filesystem. Maybe we could just release note it or something.
Any ideas?
--- ###@###.### writes:
Hi Keith,
Toward the end of the G-S talk the presenter seemed to admit the problem
was actually a linux bug. Unfortunately the linux libraries still
extract information from /proc.
One thought would be to probe for /proc. If we can't access the usual
files we pessimally assume ncpus > 1.
Dave
--- ###@###.### writes:
I found this issue on Redhat's bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151852
Looks like it might be fixed in a newer version of chroot (9.3.1). Not sure if/when that's going into Redhat (I found it in a fedora rpm).
Guess I won't worry about it, then.
--- ###@###.### writes:
... at least for the purposes of LOCK: prefixes and membar/fence
insertion. For GC we might want to assume something different. If
/proc isn't visible we could also try to probe the number of CPUs with
sched_getaffinity() or sched_setaffinit(), where available. That's
sleazy and I'd advise against it, however.
This is a linux bug. I don't think we should try work-arounds, although
it's probably worth adding a release note.
Dave
--- nikolay@###@###.### writes:
We can be cool, and on x86 compute number of CPUs like my program from attachment, although I absolutely agree with Dave that it's Linux bug and should be fixed by them.
Nikolay.
--- ###@###.### writes:
I think that Paul's point on this issue is, gee, why is it that GS is finding this, and not Sun?
A release note is definitely warranted, but I don't have an issue with some mechanism (hack) to correct the incorrect data from the OS. Dave is right, this is definitely an OS bug, but that doesn't relieve us of the responsibility to manage it.
To a certain extent, this relates a bit to the relative success of Netscape in the early days of browser development: their policy was simple: "There is no such thing as bad HTML". And as a result their browser *always* rendered something reasonably well regardless of what was stuffed into the document. While it would be nice to just say, "Oh, its an OS problem", that's not really realistic if that is what customers are using.
(Later on though, Netscape did pay a price; it was almost impossible to migrate to a new, more rationale code base. We need to balance our decisions accordingly, and make sure we carefully control how we build in hacks for problems like this.)
And, FWIW, *in this case* falsely getting the processor count as non '1', is better than the reverse. (But I suspect there are other cases where the reverse is true, so we want to be careful about how we address this.)
-John
- backported by
-
JDK-2140929 Warn about chroot instability on Linux
-
- Closed
-
- relates to
-
JDK-6506476 Internal Error 4E4D4554484F440E435050071F Java HotSpot Client VM crash in ant
-
- Closed
-
-
JDK-6593497 Linux: Hotspot JVM assumes that the process owner has read access to '/proc/self/maps'.
-
- Closed
-