How to make MySQL handle UTF-8 properly
Mastering UTF-8 in MySQL: A Comprehensive Guide

Learn how to configure MySQL databases, tables, and connections to correctly handle UTF-8 characters, preventing common encoding issues and ensuring data integrity.
UTF-8 is the most common character encoding for the web, supporting a vast range of characters from different languages. However, getting MySQL to handle UTF-8 correctly can sometimes be a challenge, leading to garbled text (mojibake) or data loss. This article provides a comprehensive guide to properly configure your MySQL environment for robust UTF-8 support, from server settings to client connections.
Understanding MySQL Character Sets and Collations
Before diving into configuration, it's crucial to understand the difference between character sets and collations in MySQL:
- Character Set: Defines the set of characters that can be stored and how they are encoded (e.g.,
utf8
,utf8mb4
).utf8mb4
is the recommended character set for full UTF-8 support, including 4-byte characters like emojis. - Collation: Defines the rules for comparing and sorting characters within a character set (e.g.,
utf8mb4_unicode_ci
,utf8mb4_general_ci
)._ci
typically means case-insensitive, while_cs
means case-sensitive._bin
means binary comparison.
MySQL applies character sets and collations at several levels: server, database, table, and column. For proper UTF-8 handling, consistency across these levels is key.
flowchart TD A[Client Application] --> B[MySQL Server Connection] B --> C[MySQL Database] C --> D[MySQL Table] D --> E[MySQL Column] subgraph Character Set & Collation Hierarchy E -- "Inherits from" --> D D -- "Inherits from" --> C C -- "Inherits from" --> B B -- "Inherits from" --> F[MySQL Server] end F["Server Default (my.cnf)"] B["Connection (SET NAMES)"] C["Database (CREATE DATABASE)"] D["Table (CREATE TABLE)"] E["Column (CREATE COLUMN)"] style F fill:#f9f,stroke:#333,stroke-width:2px style B fill:#bbf,stroke:#333,stroke-width:2px style C fill:#bfb,stroke:#333,stroke-width:2px style D fill:#ffb,stroke:#333,stroke-width:2px style E fill:#fbb,stroke:#333,stroke-width:2px
MySQL Character Set and Collation Hierarchy
Configuring MySQL for UTF-8 (utf8mb4)
The most robust way to ensure full UTF-8 support is to use the utf8mb4
character set. This involves configuring your MySQL server, databases, tables, and columns. It's also crucial to ensure your client connection is properly set.
utf8mb4
instead of utf8
. The utf8
character set in MySQL only supports a maximum of 3 bytes per character, which is insufficient for some Unicode characters (like emojis) that require 4 bytes. utf8mb4
provides full Unicode support.1. Server Configuration (my.cnf
or my.ini
)
Edit your MySQL configuration file (my.cnf
on Linux/macOS or my.ini
on Windows) to set the default character set for the server. This is a crucial first step, though it primarily affects newly created databases and tables if not explicitly specified.
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
Example my.cnf
configuration for UTF-8
After modifying my.cnf
or my.ini
, restart your MySQL server for the changes to take effect.
2. Database Configuration
When creating a new database, explicitly specify utf8mb4
and a suitable collation. If you have an existing database, you can alter it.
-- Create a new database with UTF-8 support
CREATE DATABASE my_database
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
-- Alter an existing database
ALTER DATABASE my_database
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
SQL commands for database character set configuration
3. Table and Column Configuration
For new tables, specify the character set and collation. For existing tables, you'll need to alter them. It's best practice to set character sets at the column level for string-based columns, as this provides the most granular control.
-- Create a new table with UTF-8 columns
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
email VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
bio TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);
-- Alter an existing table and its columns
ALTER TABLE products CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Or alter specific columns (more precise)
ALTER TABLE products
MODIFY COLUMN product_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY COLUMN description TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
SQL commands for table and column character set configuration
VARCHAR
columns, ensure you have enough space. Converting from utf8
to utf8mb4
can increase storage requirements for some characters. For example, a VARCHAR(255)
in utf8
might store 255 * 3 = 765 bytes, but in utf8mb4
it might need 255 * 4 = 1020 bytes. If your row size exceeds the maximum, you might encounter errors or need to change the column type to TEXT
or BLOB
.4. Client Connection Configuration
Even if your server, database, and tables are perfectly configured, data can still be corrupted if your client application doesn't communicate using the correct character set. This is often the source of 'mojibake'. You need to tell MySQL what character set your client is using.
-- Execute this immediately after connecting to the database
SET NAMES 'utf8mb4';
SQL command to set client connection character set
Most programming languages and ORMs provide a way to set the character set when establishing a connection. Here are examples for common languages:
PHP (PDO)
$dsn = 'mysql:host=localhost;dbname=my_database;charset=utf8mb4'; $pdo = new PDO($dsn, 'username', 'password');
Python (mysql-connector-python)
import mysql.connector
cnx = mysql.connector.connect( host='localhost', database='my_database', user='username', password='password', charset='utf8mb4' )
Java (JDBC)
String url = "jdbc:mysql://localhost:3306/my_database?useUnicode=true&characterEncoding=UTF-8&connectionCollation=utf8mb4_unicode_ci"; Connection conn = DriverManager.getConnection(url, "username", "password");
Node.js (mysql2)
const mysql = require('mysql2');
const connection = mysql.createConnection({ host: 'localhost', user: 'username', password: 'password', database: 'my_database', charset: 'utf8mb4' });
Verifying UTF-8 Configuration
After making changes, it's essential to verify that everything is correctly configured. You can use SQL queries to inspect the character sets and collations at various levels.
-- Check server character set variables
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
-- Check database character set and collation
SELECT default_character_set_name, default_collation_name
FROM information_schema.SCHEMATA WHERE schema_name = 'my_database';
-- Check table character set and collation
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION
FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'my_database';
-- Check column character set and collation
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'my_database' AND TABLE_NAME = 'users' AND DATA_TYPE IN ('char', 'varchar', 'text');
-- Check current connection character set
SHOW VARIABLES LIKE 'character_set_connection';
SHOW VARIABLES LIKE 'collation_connection';
SHOW VARIABLES LIKE 'character_set_results';
SHOW VARIABLES LIKE 'character_set_client';
SQL queries to verify UTF-8 configuration
SHOW VARIABLES LIKE 'character_set%';
, ensure that character_set_database
, character_set_server
, character_set_client
, character_set_connection
, and character_set_results
are all set to utf8mb4
for optimal consistency.By following these steps, you can ensure that your MySQL environment is fully equipped to handle UTF-8 characters, including emojis and a wide range of international text, preventing common encoding pitfalls and maintaining data integrity.