2.3 KiB
2.3 KiB
title, description, tags
| title | description | tags |
|---|---|---|
| Character Sets and Collations | Charset config guide | mysql, character-sets, utf8mb4, collation, encoding |
Character Sets and Collations
Always Use utf8mb4
MySQL's utf8 = utf8mb3 (3-byte only, no emoji/many CJK). Always utf8mb4.
CREATE DATABASE myapp DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Collation Quick Reference
| Collation | Behavior | Use for |
|---|---|---|
utf8mb4_0900_ai_ci |
Case-insensitive, accent-insensitive | Default |
utf8mb4_0900_as_cs |
Case/accent sensitive | Exact matching |
utf8mb4_bin |
Byte-by-byte comparison | Tokens, hashes |
_0900_ = Unicode 9.0 (preferred over older _unicode_ variants).
Collation Behavior
Collations affect string comparisons, sorting (ORDER BY), and pattern matching (LIKE):
- Case-insensitive (
_ci):'A' = 'a'evaluates to true,LIKE 'a%'matches 'Apple' - Case-sensitive (
_cs):'A' = 'a'evaluates to false,LIKE 'a%'matches only lowercase - Accent-insensitive (
_ai):'e' = 'é'evaluates to true - Accent-sensitive (
_as):'e' = 'é'evaluates to false - Binary (
_bin): strict byte-by-byte comparison (most restrictive)
You can override collation per query:
SELECT * FROM users
WHERE name COLLATE utf8mb4_0900_as_cs = 'José';
Migrating from utf8/utf8mb3
-- Find columns still using utf8
SELECT table_name, column_name FROM information_schema.columns
WHERE table_schema = 'mydb' AND character_set_name = 'utf8';
-- Convert
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Warning: index key length limits depend on InnoDB row format:
- DYNAMIC/COMPRESSED: 3072 bytes max (≈768 chars with utf8mb4)
- REDUNDANT/COMPACT: 767 bytes max (≈191 chars with utf8mb4)
VARCHAR(255) with utf8mb4 = up to 1020 bytes (4×255). That's safe for DYNAMIC/COMPRESSED but exceeds REDUNDANT/COMPACT limits.
Connection
Ensure client uses utf8mb4: SET NAMES utf8mb4; (most modern drivers default to this).
SET NAMES utf8mb4 sets three session variables:
character_set_client(encoding for statements sent to server)character_set_connection(encoding for statement processing)character_set_results(encoding for results sent to client)
It also sets collation_connection to the default collation for utf8mb4.