What is the difference between varchar and nvarchar?
Categories:
VARCHAR vs. NVARCHAR: Understanding String Data Types in SQL Server

Explore the fundamental differences between VARCHAR and NVARCHAR data types in SQL Server, focusing on storage, character sets, and performance implications for effective database design.
When designing a database in SQL Server, choosing the correct data type for string storage is crucial. Two of the most commonly used string data types are VARCHAR
and NVARCHAR
. While they both store variable-length character data, their underlying storage mechanisms, character set support, and performance characteristics differ significantly. Understanding these distinctions is key to optimizing your database for internationalization, storage efficiency, and query performance.
VARCHAR: ASCII and Single-Byte Character Storage
VARCHAR
(Variable-length character string) is designed to store non-Unicode character data. This means it typically uses a single-byte encoding (like ASCII or a specific code page) where each character occupies 1 byte of storage. The maximum length for VARCHAR
is 8,000 characters. If you need to store more, you can use VARCHAR(MAX)
, which can store up to 2 GB of data, similar to a TEXT
data type.
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255) -- Stores up to 255 non-Unicode characters
);
INSERT INTO Products (ProductID, ProductName)
VALUES (1, 'Laptop'); -- 'Laptop' uses 6 bytes of storage
Example of VARCHAR usage in a SQL Server table.
VARCHAR
when you are certain that your data will only contain characters from a single-byte character set (e.g., English, Western European languages) and internationalization is not a concern. This can save significant storage space.NVARCHAR: Unicode and Multi-Byte Character Storage
NVARCHAR
(National character varying) is designed to store Unicode character data. This is its primary distinction from VARCHAR
. Unicode is a universal character encoding standard that supports characters from virtually all writing systems of the world. To achieve this, NVARCHAR
typically uses 2 bytes per character (UTF-16 encoding in SQL Server). The maximum length for NVARCHAR
is 4,000 characters. Similar to VARCHAR(MAX)
, NVARCHAR(MAX)
can store up to 2 GB of data.
CREATE TABLE InternationalProducts (
ProductID INT PRIMARY KEY,
ProductName NVARCHAR(255) -- Stores up to 255 Unicode characters
);
INSERT INTO InternationalProducts (ProductID, ProductName)
VALUES (1, N'Laptop'), (2, N'γγΌγγγ½γ³γ³'); -- 'γγΌγγγ½γ³γ³' (Japanese) requires NVARCHAR
Example of NVARCHAR usage, including Unicode characters.
N
(e.g., N'your string'
) when inserting or comparing data with NVARCHAR
columns. Failing to do so can lead to data loss or incorrect comparisons if the string contains characters not supported by the database's default collation.Key Differences and Performance Implications
The choice between VARCHAR
and NVARCHAR
boils down to character set support and storage efficiency. Here's a summary of their core differences and how they impact performance:
flowchart TD A[Data Type Choice] --> B{Character Set Required?} B -->|Non-Unicode (e.g., English)| C[VARCHAR] B -->|Unicode (e.g., Multi-language)| D[NVARCHAR] C --> C1[Storage: 1 byte/char] C --> C2[Max Length: 8000 chars] C --> C3[Performance: Potentially faster for single-byte data] D --> D1[Storage: 2 bytes/char] D --> D2[Max Length: 4000 chars] D --> D3[Performance: Slower due to increased storage/processing for same char count]
Decision flow for choosing between VARCHAR and NVARCHAR.
Storage and Performance
- Storage:
NVARCHAR
uses twice the storage space ofVARCHAR
for the same number of characters because it stores each character using 2 bytes. This directly impacts disk space, I/O operations, and memory usage. - Indexing: Larger data types mean larger indexes. An index on an
NVARCHAR
column will be roughly twice the size of an index on aVARCHAR
column with the same data, potentially leading to more disk I/O during index scans and seeks. - Query Performance: While modern SQL Server versions are highly optimized, operations like string comparisons, sorting, and pattern matching on
NVARCHAR
columns can be slightly slower due to the increased data size and the complexity of Unicode collation rules. However, for most applications, the performance difference might be negligible unless dealing with extremely large datasets or highly intensive string manipulations.
Collation
Collation defines the rules for how character data is sorted and compared. VARCHAR
columns use the database's default collation (or a specified collation) which is typically non-Unicode. NVARCHAR
columns always use Unicode collations, which are designed to handle the complexities of various language-specific sorting and comparison rules.
NVARCHAR
is generally the safer choice to avoid future data conversion issues or character corruption. The cost of storage is often less critical than the cost of data integrity and application refactoring.