nchar, nvarchar & Unicode data types in SQL Server

nchar & nvarchar (nvarchar(max)) are the Unicode string data types in SQL Server. They are similar to char & varchar but stores the strings in Unicode encoding. nchar is a fixed width data type while nvarchar is a variable-length dataType. In this tutorial, we will learn what is nchar & nvarchar data types, and how to use them. Also, learn similarities and differences between them.

What are Unicode data types

We need to convert the characters to binary so that they can be stored on a computer. To do that we need to give a binary code to all the possible characters. The ASCII (American Standard Code for Information Interchange) is the standard that defines such binary codes to represent characters.

The ASCII code uses the 7 bits for the characters and the last 8th bit is the Parity bit. The 7 bits give us 2^7 (= 128) distinct combinations. The numbers 0 to 9 were represented using the code 48 to 57 binary numbers. Similarly, A to Z (65 to 90), and a to z (97 to 122) all have a corresponding code in the ASCII table.

When it was found out that 128 characters were also insufficient, the parity bit also included raising the number of distinct combinations to 255 ( 2^8 ). The new table was called Extended ASCII code.

But, the ASCII with only 255 distinct combinations was insufficient to represent the characters from other languages. This problem was fixed using the code pages, but it was created new problems. This required a new way to encode the data. This is where Unicode comes into the picture.

The Unicode system has binary code for all the characters of all the languages of the world. Unicode is a variable bit encoding system unlike ASCII, which uses a fixed 8 bit. In Unicode, you can choose between 32, 16, or 8-bit encodings. The more bits you choose, the more languages you can support. The lower bit encodings will support a smaller subset of languages but save a lot of space.

Unicode Data Types in SQL Server

The SQL Server has three Unicode data types. They are nchar, nvarchar / nvarchar(max), and ntext. It uses 16-bit encoding known as UCS-2 or UTF-16.

The SQL Server will remove the ntext data types from the future version of SQL Server. Instead, use the nvarchar(max)

Starting with SQL Server 2019, if you are using the UTF-8 encoding with char and varchar, the SQL Server will convert them to Unicode data types using UTF-8 encoding. 

nchar

nChar is a fixed width Unicode data type.

The syntax for declaring nchar variable is nchar(n), where n defines the string size in byte-pairs. The value of n must be from 1 through 4,000.

Note that n is string size in byte-pairs and not the number of characters or number of bytes. A byte-pair equivalent to 2 bytes.

The storage taken by the nchar is always (2 * n) bytes.

The number of characters that you can store is n if the UCS-2 encoding is used. But if UTF-16 encoding is used, then the number of characters that you can store is less than n. This is because the Supplementary Characters in UTF-16 can take up to 4 bytes (i.e two byte-pairs).

The following example, creates the table with two nchar columns.

nvarchar

nvarchar (for variable character) is a flexible width Unicode data type.

The syntax for declaring the nvarchar variable is nvarchar(n), where n defines the string size in byte-pairs. The value of n must be from 1 through 4000.

The storage taken by the nvarchar is always (2 * n) bytes + 2 bytes. The extra 2 bytes to store the length information of the string

For Example, when we store a string of length 10, The string will occupy (10*2) + 2 (=22 bytes) bytes of storage.

Unlike the nchar data type, nvarchar does not pad the string with blank spaces. Hence the data stored in the nvarchar column always occupies only the number of bytes that are needed to store the string. Hence nvarchar is more storage efficient than the nchar strings.

The number of characters that you can store in varchar(n) is n characters, if single-byte encoding character sets such as Latin (SQL_Latin1_General_CP1_CI_AS) is used, which is the default for SQL Server.

For Multibyte character sets such as Japanese or Chinese, the number of characters is less than n characters.

The following example, creates the table with two nvarchar columns.

nvarchar(max)

nvarchar(max) is similar to nvarchar except that you can store a string of size up to 2^31-1 bytes (2 GB). The actual space occupied in the disk is the same as the size of the string in bytes + 2 bytes for length information.

nchar & nvarchar are limited to 4000 byte-pairs ( 8000 bytes ) of storage space.

The SQL Server stores data in units of pages. Page size is 8 KB or 8192 bytes. Out of which 192 bytes the data is used to metadata information related to the page header, row offset, etc. That leaves us 8000 bytes to store the data. Since column data cannot exceed a page (otherwise it will be part of two pages), the maximum size you can use for a column is 8000 bytes.

But in the case of nvarchar(max), the SQL server checks if the size of the string. If it is less than 8000 bytes then it stores it in the page itself. But if it is more than 8000 bytes then it moves it to a separate page known as LOB_DATA and stores the pointer in the page. In this way, nvarchar(max) columns can store data up to 2 GB.

ntext

ntext is Variable-length Unicode data with a maximum string length of 2^30 – 1 (1,073,741,823) bytes. This data type is similar to nvarchar(max) and will be removed from the future version of SQL Server. Hence use nvarchar(max) instead.

Examples

The following example show how to create nchar, nvarchar & nvarchar(max) columns.

Inserting Values

To insert a Unicode value, you need to prefix the value with N.

Without N prefix, the the string is interpreted as non Unicode. As you can see from the result both Hindi & Chinese strings are replaced with ?.

Character count & size in bytes

In the following example, we use the Len and DATALENGTH function. The Len function returns the number of characters in a string (excluding the trailing spaces), while DATALENGTH shows the number of bytes used by that string.

As you can see from the result, although the number of characters is 7, the size differs. In the nchar(10) column size is 20 bytes ( 2 *10 ). While for nvarchar(10) column it is 14 (7*2).

Note that DATALENGTH does not include the additional 2 bytes, that nvarchar uses to store the data. So in disk nvarchar(10) column actually uses the 16 bytes ( i.e 7*2 + 2 bytes) to store the data.

nchar Vs nvarchar

CharVarchar
Used to store a unicide string of fixed lengthUsed to store a unicide string of variable length
The syntax is nchar(n), where n is the string size in byte-pairs (2 bytes) to use.The syntax is nvarchar(n) where n is the maximum number of string size in byte-pairs (2 bytes) to use.
If the string length is less than the size of the column then the char adds trailing spaces to the dataThe string is stored as it is. Does not add trailing spaces to the data
Storage size is n*2 bytes. i.e the same as the size of the column and not that of the stringThe Storage size is the actual size of the string + 2 bytes. The additional 2 bytes to store length information
The performance of the nchar is better than varchar.Performance is slower compared to the nchar as the SQL Server needs to calculate the size of the string
The maximum data that you can store is 4000 byte-pairs (i.e 8000 bytes)The maximum data that you can store is 4000 byte-pairs (i.e. 8000 bytes)
The number of characters you can store in a nchar(n) is n characters if a single-byte character set is used, which is the default in SQL ServerThe number of characters you can store in a nvarchar(n) is n characters if a single-byte character set is used, which is the default in SQL Server
A Multibyte Character Sets such as Japanese Kanji or Chinese can occupy more than 2 bytes.

In such cases, the maximum storage size in bytes will remain as n. But the number of characters that can be stored may be smaller than n
A Multibyte Character Sets such as Japanese Kanji or Chinese can occupy more than 2 bytes.

In such cases, the maximum storage size in bytes will remain as (n*2)+2 bytes. But the number of characters that can be stored may be smaller than n

Choosing the Right data type

If all your apps need to store the characters found in this ASCII Table, then you can use the char/varchar. This table characters mainly from English and also from a few other european languages like Spanish, Italian, etc.

If the column needs to store characters beyond the ASCII Table then you must use the Unicode data type like nchar/nvarchar.

But in SQL Server 2019, you can use the char/varchar to store the Unicode just be enabling the UTF-8 encoding

varchar vs nvarchar

varchar / charnvarchar / varchar
Used to store non Unicode string data.Used to store Unicode string or multilingual data
varchar uses 1-byte memory storage per character.

*varchar can take more than 1 byte in some collations. But in such situations, it is better to use nvarchar
nvarchar uses 2 bytes min per character. Maybe more depending on the Unicode encoding used
No Need for N prefixYou need to use N prefix when inserts values to database
varchar can hold up to 8000 charactersnvarchar can store up to 4000 characters

Reference

Read More

Leave a Comment

Your email address will not be published.

Scroll to Top