A customer finds out an abnormal termination in reference to symbolTable.
The following is the report from them. Their request is to set some exclusive operation
in lookup().
=== Report Start ===>
1. Symptom and what we understand
--- ./share/vm/memory/symbolTable.cpp ----
......
oop stringTableBucket::lookup(jchar* name, int len) {
for (stringTableEntry* l = entry(); l; l = l->next()) {
if (java_lang_String::equals(l->literal_string(), name, len)) {
return l->literal_string();
}
}
return NULL;
}
We found abnormal termination in java_lang_String::equals.
(one times of 100 - 150 trials with our test system )
We inserted print() to investigate more.
It turned out that the crash occurs because l->literal_string(), 1st argument of equals
is zero.
Also, we change the above code as follows.
--- Modified Code Start -----
......
for (stringTableEntry* l = entry(); l; l = l->next()) {
oop string = l->literal_string(); <============(1)
#if 1
if (!string) {
tty->print_cr("--- lookup: _literal_string is NULL! ---");
tty->print_cr("stringTableEntry=0x%16x",l);
tty->print_cr("name=%s len=%d", name, len );
StringTable::print(); <===========(2)
tty->print_cr("START VERIFY");
StringTable::verify();
tty->print_cr("FINISH VERIFY");
string->print();
tty->print_cr("FINISH PRINT");
}
#endif
if (java_lang_String::equals(string, name, len)) {
..............
Here, StringTable::print() at the above (2) is the following.
void StringTable::print() {
ResourceMark rm;
for (int i = 0; i < string_table_size; i++) {
stringTableEntry* entry = buckets[i].entry();
while(entry != NULL) {
tty->print("%d : ", i);
#if 0
entry->literal_string()->print();
#else
tty->print(" stringTableEntry=0x%16x ", entry);
oop o = entry->literal_string();
if (o) {
o->print(); <======= (3) Some value has been set.
}
else
tty->print_cr("=== literal_string is NULL ===");
#endif
tty->cr();
entry = entry->next();
}
}
}
--- Modified Code end -----
We check what the value of l->literal_string() is before the program calls equals().
We confirm some value is set to "l->literal_string()".( (3) in the above code)
That means "l->literal_string()" is Zero at (1) and then some value is set at (3).
2. Investigation
Entry of stringTableEntry is set in the following procedure.
---- Extracted Code Start -----
.....
oop StringTable::basic_add(Handle string_or_null, jchar* name, int len, int hash...
...
MutexLocker ml(StringTable_lock, THREAD);
assert(java_lang_String::equals(string(), name, len), "string must be properly initialized");
// Since look-up was done lock-free, we need to check if another thread beat us in the race to insert the symbol.
stringTableBucket* bucket = bucketFor(hashValue);
oop test = bucket->lookup(name, len); // calls lookup(u1*, int)
if (test != NULL) {
// Entry already added
return test;
}
stringTableEntry* entry;
if (free_list) {
entry = free_list;
free_list = free_list->next();
} else {
const int block_size = 500;
if (first_free_entry == end_block) {
first_free_entry = NEW_C_HEAP_ARRAY(stringTableEntry, block_size);
end_block = first_free_entry + block_size;
}
entry = first_free_entry++;
}
(A) entry->set_literal_string(string()); // clears literal string field
(B) entry->set_next(bucket->entry());
(C) bucket->set_entry(entry);
return string();
}
---- Extracted Code End -----
The above program do "write(update) " with some lock as follows.
- literal_string is set by "store" at the above (A)
- next is set by "store"at the above (B)
- Data is associated to Entry at the above (C)
lookup() function is done with "lock free" because lookup() only do reference.
If "store" is done with the order (A),(B), and (C), the data associated to Entry
does not become zero because literal_string has some value.
According to IPF memory model, the operation order of (A), (B), and (C)
is not guaranteed.
Assume the case the order is (C),(A),(B) or (C),(B),(A).
At (C), if other thread refers l->literal_string(), Entry is fixed.(exists).
However because (A) has not finished, l->literal_string() seems zero.(has not set yet)
A little bit later, the thread refers l->literal_string(). Because (a) has finished
there seems some value in l->literal_string() .
We think the above scenario occurs and causes the crash.
3. Suggested Fix
Because (A) and (B) should finish before (C),
---- Suggested Fix Start---
(A) entry->set_literal_string(string()); // clears literal string field
(B) entry->set_next(bucket->entry());
<-----
#if IA64
atomic::membar();
#endif
<------
(C) bucket->set_entry(entry);
---- Suggested Fix End ---
There is similar operation in SymbolTable, the above fix has been done also.
4. Request
1) To check whether the above mentioned our scenario is possible or not
2) To check the above suggested fix is reasonable or not
3) To check hotspot source which might have similar operation
and resolve.
example
- there is some portion of source code which expect that the data is set with
specific order of "store"operations
and the data is possibly referred by other thread with lock free.
<=== Report End ====
The following is the report from them. Their request is to set some exclusive operation
in lookup().
=== Report Start ===>
1. Symptom and what we understand
--- ./share/vm/memory/symbolTable.cpp ----
......
oop stringTableBucket::lookup(jchar* name, int len) {
for (stringTableEntry* l = entry(); l; l = l->next()) {
if (java_lang_String::equals(l->literal_string(), name, len)) {
return l->literal_string();
}
}
return NULL;
}
We found abnormal termination in java_lang_String::equals.
(one times of 100 - 150 trials with our test system )
We inserted print() to investigate more.
It turned out that the crash occurs because l->literal_string(), 1st argument of equals
is zero.
Also, we change the above code as follows.
--- Modified Code Start -----
......
for (stringTableEntry* l = entry(); l; l = l->next()) {
oop string = l->literal_string(); <============(1)
#if 1
if (!string) {
tty->print_cr("--- lookup: _literal_string is NULL! ---");
tty->print_cr("stringTableEntry=0x%16x",l);
tty->print_cr("name=%s len=%d", name, len );
StringTable::print(); <===========(2)
tty->print_cr("START VERIFY");
StringTable::verify();
tty->print_cr("FINISH VERIFY");
string->print();
tty->print_cr("FINISH PRINT");
}
#endif
if (java_lang_String::equals(string, name, len)) {
..............
Here, StringTable::print() at the above (2) is the following.
void StringTable::print() {
ResourceMark rm;
for (int i = 0; i < string_table_size; i++) {
stringTableEntry* entry = buckets[i].entry();
while(entry != NULL) {
tty->print("%d : ", i);
#if 0
entry->literal_string()->print();
#else
tty->print(" stringTableEntry=0x%16x ", entry);
oop o = entry->literal_string();
if (o) {
o->print(); <======= (3) Some value has been set.
}
else
tty->print_cr("=== literal_string is NULL ===");
#endif
tty->cr();
entry = entry->next();
}
}
}
--- Modified Code end -----
We check what the value of l->literal_string() is before the program calls equals().
We confirm some value is set to "l->literal_string()".( (3) in the above code)
That means "l->literal_string()" is Zero at (1) and then some value is set at (3).
2. Investigation
Entry of stringTableEntry is set in the following procedure.
---- Extracted Code Start -----
.....
oop StringTable::basic_add(Handle string_or_null, jchar* name, int len, int hash...
...
MutexLocker ml(StringTable_lock, THREAD);
assert(java_lang_String::equals(string(), name, len), "string must be properly initialized");
// Since look-up was done lock-free, we need to check if another thread beat us in the race to insert the symbol.
stringTableBucket* bucket = bucketFor(hashValue);
oop test = bucket->lookup(name, len); // calls lookup(u1*, int)
if (test != NULL) {
// Entry already added
return test;
}
stringTableEntry* entry;
if (free_list) {
entry = free_list;
free_list = free_list->next();
} else {
const int block_size = 500;
if (first_free_entry == end_block) {
first_free_entry = NEW_C_HEAP_ARRAY(stringTableEntry, block_size);
end_block = first_free_entry + block_size;
}
entry = first_free_entry++;
}
(A) entry->set_literal_string(string()); // clears literal string field
(B) entry->set_next(bucket->entry());
(C) bucket->set_entry(entry);
return string();
}
---- Extracted Code End -----
The above program do "write(update) " with some lock as follows.
- literal_string is set by "store" at the above (A)
- next is set by "store"at the above (B)
- Data is associated to Entry at the above (C)
lookup() function is done with "lock free" because lookup() only do reference.
If "store" is done with the order (A),(B), and (C), the data associated to Entry
does not become zero because literal_string has some value.
According to IPF memory model, the operation order of (A), (B), and (C)
is not guaranteed.
Assume the case the order is (C),(A),(B) or (C),(B),(A).
At (C), if other thread refers l->literal_string(), Entry is fixed.(exists).
However because (A) has not finished, l->literal_string() seems zero.(has not set yet)
A little bit later, the thread refers l->literal_string(). Because (a) has finished
there seems some value in l->literal_string() .
We think the above scenario occurs and causes the crash.
3. Suggested Fix
Because (A) and (B) should finish before (C),
---- Suggested Fix Start---
(A) entry->set_literal_string(string()); // clears literal string field
(B) entry->set_next(bucket->entry());
<-----
#if IA64
atomic::membar();
#endif
<------
(C) bucket->set_entry(entry);
---- Suggested Fix End ---
There is similar operation in SymbolTable, the above fix has been done also.
4. Request
1) To check whether the above mentioned our scenario is possible or not
2) To check the above suggested fix is reasonable or not
3) To check hotspot source which might have similar operation
and resolve.
example
- there is some portion of source code which expect that the data is set with
specific order of "store"operations
and the data is possibly referred by other thread with lock free.
<=== Report End ====