Collation in Oracle using NLS_SORT

Yesterday I attended a technical presentation on Character Sets. As I told the presenter afterwards, to me attending this meeting was like reading the Kitab-i-Iqan (book of certitude) and then looking at religious history – it was the key that suddenly made everything else clear. I had come across the terms Character Set, Character Encoding, collation, UTF8, ASCII, ANSI, etc many times and had some idea what they referred to, but was really confused about how they all relate to each other. Now, it was all much more clear. We’ll have a second part of the presentation later, and I may post a summary of the concepts after that.

Sorting ShellsBut already puting theory to practice – this morning I had a request to remove diacritics from a listing of countries, so they sort properly. I thought “Aha!” This is a collation issue. Collation refers to setting items in order, and in the case of databases the collation sequence is the set of rules by which characters are ordered – for example a, b, c, ä or a, ä, b, c. So instead of simply removing the diacritics I googled “oracle collation” and ended up finding a better solution:

 

Oracle uses a setting called NLS_LANG to determine a number of things – language, territory and character set. This is an important setting, as any Oracle DBA would tell you. The default collation sequence is derived from the language portion of that setting, as there is a corresponding NLS_SORT setting for each NLS_LANGUAGE, but then there are also more custom values that you can use. Once you set a value for NLS_SORT, that collating sequence will then be used for sorting when running a query with the ORDER BY clause. 

First let’s check what our current collation sequence is set to:

SELECT SYS_CONTEXT ('USERENV', 'NLS_SORT')
  FROM DUAL;

Then, get a list of the possible values you can set NLS_SORT to:

SELECT *
  FROM v$nls_valid_values
 WHERE parameter = 'SORT';

Choose the collation sequence you want to use from this list. In my case, I wasn’t quite sure which one I wanted, so I searched some more and found one called BINARY_AI which did what I was looking for. Turns out each value in the above table of possible values can have a hybrid definition, by adding a suffix of _CI (case insensitivity) or _AI (accent-insensitive and case-insensitive).

Finally, to set the collation sequence for the current session:

ALTER SESSION SET nls_sort='BINARY_AI';

 

Example:

My query for a list of countries originally gave these results (NLS_SORT was set to BINARY):

 Romania
 Russian Federation
 Rwanda
 Réunion

After the NLS_SORT was set to BINARY_AI:

 Réunion
 Romania
 Russian Federation
 Rwanda
This entry was posted in Spirituality, Technology and tagged , , , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

10 Comments

  1. Posted 2009/09/18 at 7:30 pm | Permalink

    I am in the same position you were. I know a little about character sets, but not as much as you have discussed here. Thanks for sharing this, I find it very helpful.

  2. Juan Diego
    Posted 2010/03/11 at 7:19 pm | Permalink

    I found another way to use the script, avoiding the alter session command:

    SELECT * FROM table SORT by NLSSORT(FIELD,’NLS_SORT=BINARY_AI’)

    Thanks for the article. Very useful.

  3. Posted 2010/03/12 at 7:23 am | Permalink

    Wow, thanks Juan that’s brilliant!

    I’ll definitely use this at some point, as I also prefer not having to alter session.

  4. anonymous
    Posted 2011/03/09 at 3:20 am | Permalink

    Yes, BUT keep in mind that your INDEXES aren’t built that way.
    If you were smart, you’d do this experiment: Put a standard B-Tree index on the column you use in your WHERE clause, and gather stats on that table/index/column, and then run an EXPLAIN PLAN for your query.
    Then change the NLS_SORT and run the query again.

    …I think you’ll very likely note that changing the NLS_SORT will result in more full table scans- because the indexes would be built according to a different NLS_SORT than what you are setting it to.
    So- what does Oracle do? …It doesn’t use the index. It full scans.

    Also- it would probably help if you had a fairly decent size data set to test against (if your table only has 3 rows, Oracle will probably full-scan, no matter what).

  5. Frosty
    Posted 2011/04/19 at 10:25 am | Permalink

    Thank you for the piece of advice!

  6. Bob Miller
    Posted 2011/09/02 at 6:20 pm | Permalink

    Do what Juan said! it works brilliantly, man.
    -Bob from BMW Specials

  7. evensecet
    Posted 2012/05/30 at 6:25 am | Permalink

    Who and where to condense this summer on festival, share your information.

  8. Bob
    Posted 2012/08/15 at 8:51 pm | Permalink

    There are two important lessons here:
    (1) Using ALTER SESSION changes the behaviour of all SORT BY and comparisons during the session, whereas SORT BY NLSSORT(FIELD,’NLS_SORT=BINARY_AI’) affects *only* the one sort it is specified for.

    (2) INDEXES are subject to the NLS order when they are of character strings (otherwise, how would you know whether an unaccented A was less than or greater than an accented A?). For that reason, the appropriate collating sequence should be established (by the DB admin) before the indexes are built.

  9. Posted 2013/05/25 at 1:33 pm | Permalink

    Need to change collation in table to UTF-8_general_ci or when creating it.
    Can not find any solution of it for oracle. How can it possible.

  10. Posted 2014/01/08 at 2:45 pm | Permalink

    Your post was really helpful and easy to understand. Thanks

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*

Time limit is exhausted. Please reload the CAPTCHA.