As of JDK8, HotSpot uses the SymbolTable to force every C++ Symbol object to contain a unique UTF8 string. As a result, Symbol equivalence tests are done using the == operator, like
bool InstanceKlass::find_local_field(Symbol* name, Symbol* sig, fieldDescriptor* fd) const {
for (JavaFieldStream fs(this); !fs.done(); fs.next()) {
Symbol* f_name = fs.name();
Symbol* f_sig = fs.signature();
if (f_name == name && f_sig == sig) {
The advantage of this is the comparison is very fast. The limitations is:
+ a Symbol can be shared by classes loaded by different ClassLoaderData's, so we cannot allocate the Symbol using the CLD metaspace. Consequently, they are allocated using malloc, which has more overhead than CLD.
+ Symbols need to be ref-counted in order to deallocate them
If we override the == operator of Symbol*, we can allow multiple Symbols to be considered "equivalent". Thus we can overcome the above limitations. Ideally, we want to define something like
int operator==(const Symbol* a, const Symbol* b) {
#if USE_MULTIPLE_EQUIVALENT_SYMBOLS
return a->equals(b); // new behavior
#else
return a == b; // old behavior as in JDK8
#endif
}
However, C++ doesn't allow the == operator to be overloaded for pointer types (http://stackoverflow.com/questions/6474675/why-cant-i-use-two-ptrs-in-operator-overload)
An operator function shall either be a non-static member function or be a non-member function
and have at least one parameter whose type is a class, a reference to a class, an enumeration,
or a reference to an enumeration.
Coleen's idea is to wrap the Symbol* into another class, like this:
// test.cpp -------------------------------------
class Symbol;
class SymbolRef {
public:
Symbol *sym;
const int operator==(SymbolRef& other) {
return sym == other.sym;
}
};
class OldInstanceKlass {
public:
int _foobar;
Symbol* _name;
};
class NewInstanceKlass {
public:
int _foobar;
SymbolRef _name;
};
int func(SymbolRef a, SymbolRef b) {
return a == b;
}
int compare_klasss_old(OldInstanceKlass *klass, Symbol* name) {
return klass->_name == name;
}
int compare_klasss_new(NewInstanceKlass *klass, SymbolRef name) {
return klass->_name == name;
}
---------------------
I tested GCC and it produces the same code for the SymbolRef as for the original code. I.e., even we are nominally passing a structure, GCC is smart enough to treat the structure just as a simple value that can be stored in a hardware register:
$ g++ -O3 -S -o - test.cpp | c++filt
compare_klasss_old(OldInstanceKlass*, Symbol*):
xorl %eax, %eax
cmpq %rsi, 8(%rdi)
sete %al
ret
compare_klasss_new(NewInstanceKlass*, SymbolRef):
xorl %eax, %eax
cmpq %rsi, 8(%rdi)
sete %al
ret
So I think we can modify HotSpot now to convert all Symbol* usage to SymbolRef:
class Klass {
- Symbol* _name;
+ SymbolRef _name;
- Symbol* name() const { return _name; }
+ SymbolRef name() const { return _name; }
- bool InstanceKlass::find_local_field(Symbol* name, Symbol* sig, fieldDescriptor* fd) const {
+ bool InstanceKlass::find_local_field(SymbolRef name, SymbolRef sig, fieldDescriptor* fd) const {
for (JavaFieldStream fs(this); !fs.done(); fs.next()) {
- Symbol* f_name = fs.name();
+ SymbolRef f_name = fs.name();
- Symbol* f_sig = fs.signature();
+ SymbolRef f_sig = fs.signature();
if (f_name == name && f_sig == sig) {
fd->reinitialize(const_cast<InstanceKlass*>(this), fs.index());
return true;
}
}
return false;
}
- Symbol* SymbolTable::lookup(int index, const char* name, int len, unsigned int hash);
+ SymbolRef SymbolTable::lookup(int index, const char* name, int len, unsigned int hash);
The Symbol class will become more of an internal type, only used by a few places like SymbolTable, ClassLoaderData etc.
We can probably make it a little difficult to get to a Symbol*, like:
class SymbolRef {
private:
Symbol* _symbol;
Symbol* symbol() {return _symbol;}
friend class SymbolTable;
friend class ClassLoaderData;
};
so people won't be able to do (Symbol* == Symbol*) by mistake.
PERFORMANCE IMPACT -
Preliminary testing/guesstimates show that the performance impact is negligible, regardless of the value of USE_MULTIPLE_EQUIVALENT_SYMBOLS. More data will be provided.
PROPOSAL -
The proposal of this RFE is to add the SymbolRef class and change all the applicable HotSpot code to switch from Symbol* to SymbolRef. For now, we keep the SymbolRef:operator==() to do simple pointer comparison of Symbol*.
In an follow-on RFE (to be filed), we will implement USE_MULTIPLE_EQUIVALENT_SYMBOLS
bool InstanceKlass::find_local_field(Symbol* name, Symbol* sig, fieldDescriptor* fd) const {
for (JavaFieldStream fs(this); !fs.done(); fs.next()) {
Symbol* f_name = fs.name();
Symbol* f_sig = fs.signature();
if (f_name == name && f_sig == sig) {
The advantage of this is the comparison is very fast. The limitations is:
+ a Symbol can be shared by classes loaded by different ClassLoaderData's, so we cannot allocate the Symbol using the CLD metaspace. Consequently, they are allocated using malloc, which has more overhead than CLD.
+ Symbols need to be ref-counted in order to deallocate them
If we override the == operator of Symbol*, we can allow multiple Symbols to be considered "equivalent". Thus we can overcome the above limitations. Ideally, we want to define something like
int operator==(const Symbol* a, const Symbol* b) {
#if USE_MULTIPLE_EQUIVALENT_SYMBOLS
return a->equals(b); // new behavior
#else
return a == b; // old behavior as in JDK8
#endif
}
However, C++ doesn't allow the == operator to be overloaded for pointer types (http://stackoverflow.com/questions/6474675/why-cant-i-use-two-ptrs-in-operator-overload)
An operator function shall either be a non-static member function or be a non-member function
and have at least one parameter whose type is a class, a reference to a class, an enumeration,
or a reference to an enumeration.
Coleen's idea is to wrap the Symbol* into another class, like this:
// test.cpp -------------------------------------
class Symbol;
class SymbolRef {
public:
Symbol *sym;
const int operator==(SymbolRef& other) {
return sym == other.sym;
}
};
class OldInstanceKlass {
public:
int _foobar;
Symbol* _name;
};
class NewInstanceKlass {
public:
int _foobar;
SymbolRef _name;
};
int func(SymbolRef a, SymbolRef b) {
return a == b;
}
int compare_klasss_old(OldInstanceKlass *klass, Symbol* name) {
return klass->_name == name;
}
int compare_klasss_new(NewInstanceKlass *klass, SymbolRef name) {
return klass->_name == name;
}
---------------------
I tested GCC and it produces the same code for the SymbolRef as for the original code. I.e., even we are nominally passing a structure, GCC is smart enough to treat the structure just as a simple value that can be stored in a hardware register:
$ g++ -O3 -S -o - test.cpp | c++filt
compare_klasss_old(OldInstanceKlass*, Symbol*):
xorl %eax, %eax
cmpq %rsi, 8(%rdi)
sete %al
ret
compare_klasss_new(NewInstanceKlass*, SymbolRef):
xorl %eax, %eax
cmpq %rsi, 8(%rdi)
sete %al
ret
So I think we can modify HotSpot now to convert all Symbol* usage to SymbolRef:
class Klass {
- Symbol* _name;
+ SymbolRef _name;
- Symbol* name() const { return _name; }
+ SymbolRef name() const { return _name; }
- bool InstanceKlass::find_local_field(Symbol* name, Symbol* sig, fieldDescriptor* fd) const {
+ bool InstanceKlass::find_local_field(SymbolRef name, SymbolRef sig, fieldDescriptor* fd) const {
for (JavaFieldStream fs(this); !fs.done(); fs.next()) {
- Symbol* f_name = fs.name();
+ SymbolRef f_name = fs.name();
- Symbol* f_sig = fs.signature();
+ SymbolRef f_sig = fs.signature();
if (f_name == name && f_sig == sig) {
fd->reinitialize(const_cast<InstanceKlass*>(this), fs.index());
return true;
}
}
return false;
}
- Symbol* SymbolTable::lookup(int index, const char* name, int len, unsigned int hash);
+ SymbolRef SymbolTable::lookup(int index, const char* name, int len, unsigned int hash);
The Symbol class will become more of an internal type, only used by a few places like SymbolTable, ClassLoaderData etc.
We can probably make it a little difficult to get to a Symbol*, like:
class SymbolRef {
private:
Symbol* _symbol;
Symbol* symbol() {return _symbol;}
friend class SymbolTable;
friend class ClassLoaderData;
};
so people won't be able to do (Symbol* == Symbol*) by mistake.
PERFORMANCE IMPACT -
Preliminary testing/guesstimates show that the performance impact is negligible, regardless of the value of USE_MULTIPLE_EQUIVALENT_SYMBOLS. More data will be provided.
PROPOSAL -
The proposal of this RFE is to add the SymbolRef class and change all the applicable HotSpot code to switch from Symbol* to SymbolRef. For now, we keep the SymbolRef:operator==() to do simple pointer comparison of Symbol*.
In an follow-on RFE (to be filed), we will implement USE_MULTIPLE_EQUIVALENT_SYMBOLS