oracle通常有两种排序方法,binary sort和linguistic sort。binary根据字符集的二进制编码去排序,比如英文字母根据ASCII的编码标准去排序;多字节字符会根据其字符的内部编码去排序,像汉字根据汉字内码来排序;而linguistic排序分两种情况,a) 单字符环境,也就是排序字段只包含一种字符集的数据,这个时候会根据字符集的默认排序标准排列数据,比如汉字默认采用拼音码来排序;b)多字符环境, 也就是排序字段包含了两种或以上字符集的数据,这个时候oracle会采用一种叫做分级的排序方法,什么意思呢?就是Oracle会根据第一层的方法去排序,如果第一层方法排序之后的顺序相同会采用第二种方法,一次类推,最多只能到第三层。各语言都提供了自己的排序方法 。
各字符集排序方法http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#i637232
binary sort、linguistic sort http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch5lingsort.htm#i1008187
现在我们只单独考虑汉字排序问题,影响汉字排序有以下两个因素:
字符集
-
NLS_SORT
如果数据库字符集采用了中文字符集,比如(ZHS16GBK,ZHS16CGB231201等),默认情况下中文排序会按各汉字的拼音字母顺序排序,但同时还受NLS_SORT参数的限制。NLS_SORT可以设置排序的类别和具体的排序方法,默认为BINARY。我们常用的简体中文如果根据linguistic来排序,可用的方法有(9i后新增):
SCHINESE_RADICAL_M Simplified Chinese sort based on radical as primary order and number of strokes order as secondary order SCHINESE_STROKE_M Simplified Chinese sort uses number of strokes as primary order and radical as secondary order SCHINESE_PINYIN_M Simplified Chinese PinYin sorting order TCHINESE_RADICAL_M Traditional Chinese sort based on radical as primary order and number of strokes order as secondary order TCHINESE_STROKE_M Traditional Chinese sort uses number of strokes as primary order and radical as secondary order. It supports supplementary characters
有以下测试数据。
create table chars(id number,region varchar2(20)); insert into chars values(1,'陕西'); insert into chars values(2,'北京'); insert into chars values(3,'北京西'); insert into chars values(4,'上海'); commit;
我们希望region列按拼音码的顺序来排序 :
SQL> select * from chars order by region;
ID REGION
---------- ----------------------------------------
4 上海
2 北京
3 北京西
1 陕西
这个排序结果是不正确的。当前session的nls参数:
SQL> col parameter format a30
SQL> col value format a20
SQL> select * from nls_session_parameters;
PARAMETER VALUE
------------------------------ --------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT yyyy-mm-dd hh24:mi:s
s
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
PARAMETER VALUE
------------------------------ --------------------
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXF
F AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXF
F AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
PARAMETER VALUE
------------------------------ --------------------
NLS_NCHAR_CONV_EXCP FALSE
17 rows selected.
这里默认的排序方法为BINARY。LANGUAGE为AMERICAN,数据库字符集为AL32UTF8:
SQL> select * from nls_database_parameters; PARAMETER VALUE ------------------------------ -------------------- NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET AL32UTF8
修改当前session的NLS_SORT参数为拼音排序发:
SQL> alter session set nls_sort=SCHINESE_PINYIN_M;
Session altered.
SQL> select * from chars order by region;
ID REGION
---------- ----------------------------------------
2 北京
3 北京西
1 陕西
4 上海
SQL>
SQL> insert into chars values(5,'陕北');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from chars order by region;
ID REGION
---------- ----------------------------------------
2 北京
3 北京西
5 陕北
1 陕西
4 上海
alter session修改了整个session的排序方法,这样可能会影响其他排序,通过NLSSORT函数指定只对当前SQL排序的方法 :
SQL> select * from chars order by nlssort(region,'NLS_SORT=SCHINESE_PINYIN_M');
ID REGION
---------- ----------------------------------------
2 北京
3 北京西
5 陕北
1 陕西
4 上海
在接着看下面的的排序
SQL> insert into chars values(6,'How Are You');
1 row created.
SQL> insert into chars values(7,null);
1 row created.
SQL> insert into chars values(8,'Are you ok');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from chars order by region;
ID REGION
---------- ----------------------------------------
8 Are you ok
6 How Are You
2 北京
3 北京西
5 陕北
1 陕西
4 上海
7
8 rows selected.
混合数据中,首先按单字节的字符排序;然后排序多字节字符,多字节字符根据设定的排序方法来排序。