add brain

This commit is contained in:
2026-03-12 15:17:52 +07:00
parent fd9f558fa1
commit e7821a7a9d
355 changed files with 93784 additions and 24 deletions

View File

@@ -0,0 +1,66 @@
---
title: Character Sets and Collations
description: Charset config guide
tags: mysql, character-sets, utf8mb4, collation, encoding
---
# Character Sets and Collations
## Always Use utf8mb4
MySQL's `utf8` = `utf8mb3` (3-byte only, no emoji/many CJK). Always `utf8mb4`.
```sql
CREATE DATABASE myapp DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
```
## Collation Quick Reference
| Collation | Behavior | Use for |
|---|---|---|
| `utf8mb4_0900_ai_ci` | Case-insensitive, accent-insensitive | Default |
| `utf8mb4_0900_as_cs` | Case/accent sensitive | Exact matching |
| `utf8mb4_bin` | Byte-by-byte comparison | Tokens, hashes |
`_0900_` = Unicode 9.0 (preferred over older `_unicode_` variants).
## Collation Behavior
Collations affect string comparisons, sorting (`ORDER BY`), and pattern matching (`LIKE`):
- **Case-insensitive (`_ci`)**: `'A' = 'a'` evaluates to true, `LIKE 'a%'` matches 'Apple'
- **Case-sensitive (`_cs`)**: `'A' = 'a'` evaluates to false, `LIKE 'a%'` matches only lowercase
- **Accent-insensitive (`_ai`)**: `'e' = 'é'` evaluates to true
- **Accent-sensitive (`_as`)**: `'e' = 'é'` evaluates to false
- **Binary (`_bin`)**: strict byte-by-byte comparison (most restrictive)
You can override collation per query:
```sql
SELECT * FROM users
WHERE name COLLATE utf8mb4_0900_as_cs = 'José';
```
## Migrating from utf8/utf8mb3
```sql
-- Find columns still using utf8
SELECT table_name, column_name FROM information_schema.columns
WHERE table_schema = 'mydb' AND character_set_name = 'utf8';
-- Convert
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
```
**Warning**: index key length limits depend on InnoDB row format:
- DYNAMIC/COMPRESSED: 3072 bytes max (≈768 chars with utf8mb4)
- REDUNDANT/COMPACT: 767 bytes max (≈191 chars with utf8mb4)
`VARCHAR(255)` with utf8mb4 = up to 1020 bytes (4×255). That's safe for DYNAMIC/COMPRESSED but exceeds REDUNDANT/COMPACT limits.
## Connection
Ensure client uses `utf8mb4`: `SET NAMES utf8mb4;` (most modern drivers default to this).
`SET NAMES utf8mb4` sets three session variables:
- `character_set_client` (encoding for statements sent to server)
- `character_set_connection` (encoding for statement processing)
- `character_set_results` (encoding for results sent to client)
It also sets `collation_connection` to the default collation for utf8mb4.