add brain
This commit is contained in:
@@ -0,0 +1,66 @@
|
||||
---
|
||||
title: Character Sets and Collations
|
||||
description: Charset config guide
|
||||
tags: mysql, character-sets, utf8mb4, collation, encoding
|
||||
---
|
||||
|
||||
# Character Sets and Collations
|
||||
|
||||
## Always Use utf8mb4
|
||||
MySQL's `utf8` = `utf8mb3` (3-byte only, no emoji/many CJK). Always `utf8mb4`.
|
||||
|
||||
```sql
|
||||
CREATE DATABASE myapp DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
|
||||
```
|
||||
|
||||
## Collation Quick Reference
|
||||
| Collation | Behavior | Use for |
|
||||
|---|---|---|
|
||||
| `utf8mb4_0900_ai_ci` | Case-insensitive, accent-insensitive | Default |
|
||||
| `utf8mb4_0900_as_cs` | Case/accent sensitive | Exact matching |
|
||||
| `utf8mb4_bin` | Byte-by-byte comparison | Tokens, hashes |
|
||||
|
||||
`_0900_` = Unicode 9.0 (preferred over older `_unicode_` variants).
|
||||
|
||||
## Collation Behavior
|
||||
|
||||
Collations affect string comparisons, sorting (`ORDER BY`), and pattern matching (`LIKE`):
|
||||
|
||||
- **Case-insensitive (`_ci`)**: `'A' = 'a'` evaluates to true, `LIKE 'a%'` matches 'Apple'
|
||||
- **Case-sensitive (`_cs`)**: `'A' = 'a'` evaluates to false, `LIKE 'a%'` matches only lowercase
|
||||
- **Accent-insensitive (`_ai`)**: `'e' = 'é'` evaluates to true
|
||||
- **Accent-sensitive (`_as`)**: `'e' = 'é'` evaluates to false
|
||||
- **Binary (`_bin`)**: strict byte-by-byte comparison (most restrictive)
|
||||
|
||||
You can override collation per query:
|
||||
|
||||
```sql
|
||||
SELECT * FROM users
|
||||
WHERE name COLLATE utf8mb4_0900_as_cs = 'José';
|
||||
```
|
||||
|
||||
## Migrating from utf8/utf8mb3
|
||||
|
||||
```sql
|
||||
-- Find columns still using utf8
|
||||
SELECT table_name, column_name FROM information_schema.columns
|
||||
WHERE table_schema = 'mydb' AND character_set_name = 'utf8';
|
||||
-- Convert
|
||||
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
|
||||
```
|
||||
|
||||
**Warning**: index key length limits depend on InnoDB row format:
|
||||
- DYNAMIC/COMPRESSED: 3072 bytes max (≈768 chars with utf8mb4)
|
||||
- REDUNDANT/COMPACT: 767 bytes max (≈191 chars with utf8mb4)
|
||||
|
||||
`VARCHAR(255)` with utf8mb4 = up to 1020 bytes (4×255). That's safe for DYNAMIC/COMPRESSED but exceeds REDUNDANT/COMPACT limits.
|
||||
|
||||
## Connection
|
||||
Ensure client uses `utf8mb4`: `SET NAMES utf8mb4;` (most modern drivers default to this).
|
||||
|
||||
`SET NAMES utf8mb4` sets three session variables:
|
||||
- `character_set_client` (encoding for statements sent to server)
|
||||
- `character_set_connection` (encoding for statement processing)
|
||||
- `character_set_results` (encoding for results sent to client)
|
||||
|
||||
It also sets `collation_connection` to the default collation for utf8mb4.
|
||||
Reference in New Issue
Block a user