A DESCRIPTION OF THE PROBLEM :
The getExtraNameCharacters() method of DatabaseMetaData is currently defined as:
> Retrieves all the "extra" characters that can be used in unquoted identifier names (those beyond a-z, A-Z, 0-9 and _).
That is not exactly how identifiers work, at least not for any system that follows the Unicode specification for identifiers. According to "Unicodeî Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX" ( https://www.unicode.org/reports/tr31/#Default_Identifier_Syntax ) there are two categories of characters: Start and Continue.
* "Start" characters (having the Unicode properties of id_start / xid_start) are valid in any position.
* "Continue" characters (having the Unicode properties of id_continue / xid_continue) are valid in any position _except_ the first character.
So, all "start" characters are also "continue" characters, but _not_ all "continue" characters are "start" characters. For example, in Microsoft SQL Server (and likely some others), it is valid to have decimal digits 0-9 in the name, _but_ the name cannot _start_ with a digit. In the following example, the first statement succeeds but the second statement fails:
USE [tempdb];
CREATE TABLE dbo.A1 (col INT);
GO
CREATE TABLE dbo.1A (col INT);
GO
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '.1'.
*/
This is an important distinction because in its current form, getExtraNameCharacters() returns potentially incorrect information. For example, in SQL Server this method ( https://github.com/microsoft/mssql-jdbc/blob/dev/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDatabaseMetaData.java ) returns:
$#@
If someone is relying on this method to get those "extra" characters, it certainly appears to be saying that "$" is valid. Well, the following example works like the first example in that the first statement succeeds while the second statement fails:
USE [tempdb];
CREATE TABLE dbo.Bob$ (col INT);
GO
CREATE TABLE dbo.$Bob (col INT);
GO
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '$Bob'.
*/
Hence, the getExtraNameCharacters method needs to be replaced with two methods:
* getExtraNameStartCharacters
* getExtraNameContinueCharacters
To see the full list of characters that SQL Server should be returning for each of those two methods, please see:
https://sqlquantumleap.com/reference/completely-complete-list-of-valid-t-sql-identifier-characters/
For systems that do not follow the Unicode specification and only have a single set of characters, those two functions could simply return the same value.
Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/
FREQUENCY : always
The getExtraNameCharacters() method of DatabaseMetaData is currently defined as:
> Retrieves all the "extra" characters that can be used in unquoted identifier names (those beyond a-z, A-Z, 0-9 and _).
That is not exactly how identifiers work, at least not for any system that follows the Unicode specification for identifiers. According to "Unicodeî Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX" ( https://www.unicode.org/reports/tr31/#Default_Identifier_Syntax ) there are two categories of characters: Start and Continue.
* "Start" characters (having the Unicode properties of id_start / xid_start) are valid in any position.
* "Continue" characters (having the Unicode properties of id_continue / xid_continue) are valid in any position _except_ the first character.
So, all "start" characters are also "continue" characters, but _not_ all "continue" characters are "start" characters. For example, in Microsoft SQL Server (and likely some others), it is valid to have decimal digits 0-9 in the name, _but_ the name cannot _start_ with a digit. In the following example, the first statement succeeds but the second statement fails:
USE [tempdb];
CREATE TABLE dbo.A1 (col INT);
GO
CREATE TABLE dbo.1A (col INT);
GO
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '.1'.
*/
This is an important distinction because in its current form, getExtraNameCharacters() returns potentially incorrect information. For example, in SQL Server this method ( https://github.com/microsoft/mssql-jdbc/blob/dev/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDatabaseMetaData.java ) returns:
$#@
If someone is relying on this method to get those "extra" characters, it certainly appears to be saying that "$" is valid. Well, the following example works like the first example in that the first statement succeeds while the second statement fails:
USE [tempdb];
CREATE TABLE dbo.Bob$ (col INT);
GO
CREATE TABLE dbo.$Bob (col INT);
GO
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '$Bob'.
*/
Hence, the getExtraNameCharacters method needs to be replaced with two methods:
* getExtraNameStartCharacters
* getExtraNameContinueCharacters
To see the full list of characters that SQL Server should be returning for each of those two methods, please see:
https://sqlquantumleap.com/reference/completely-complete-list-of-valid-t-sql-identifier-characters/
For systems that do not follow the Unicode specification and only have a single set of characters, those two functions could simply return the same value.
Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/
FREQUENCY : always