Files
CleanArchitecture-template/.brain/.agent/skills/database-optimization/mysql/references/json-column-patterns.md
2026-03-12 15:17:52 +07:00

78 lines
3.1 KiB
Markdown

---
title: JSON Column Best Practices
description: When and how to use JSON columns safely
tags: mysql, json, generated-columns, indexes, data-modeling
---
# JSON Column Patterns
MySQL 5.7+ supports native JSON columns. Useful, but with important caveats.
## When JSON Is Appropriate
- Truly schema-less data (user preferences, metadata bags, webhook payloads).
- Rarely filtered/joined — if you query a JSON path frequently, extract it to a real column.
## Indexing JSON: Use Generated Columns
You **cannot** index a JSON column directly. Create a virtual generated column and index that:
```sql
ALTER TABLE events
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
ADD INDEX idx_event_type (event_type);
```
## Extraction Operators
| Syntax | Returns | Use for |
|---|---|---|
| `JSON_EXTRACT(col, '$.key')` | JSON type value (e.g., `"foo"` for strings) | When you need JSON type semantics |
| `col->'$.key'` | Same as `JSON_EXTRACT(col, '$.key')` | Shorthand |
| `col->>'$.key'` | Unquoted scalar (equivalent to `JSON_UNQUOTE(JSON_EXTRACT(col, '$.key'))`) | WHERE comparisons, display |
Always use `->>` (unquote) in WHERE clauses, otherwise you compare against `"foo"` (with quotes).
Tip: the generated column example above can be written more concisely as:
```sql
ALTER TABLE events
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
ADD INDEX idx_event_type (event_type);
```
## Multi-Valued Indexes (MySQL 8.0.17+)
If you store arrays in JSON (e.g., `tags: ["electronics","sale"]`), MySQL 8.0.17+ supports multi-valued indexes to index array elements:
```sql
ALTER TABLE products
ADD INDEX idx_tags ((CAST(tags AS CHAR(50) ARRAY)));
```
This can accelerate membership queries such as:
```sql
SELECT * FROM products WHERE 'electronics' MEMBER OF (tags);
```
## Collation and Type Casting Pitfalls
- **JSON type comparisons**: `JSON_EXTRACT` returns JSON type. Comparing directly to strings can be wrong for numbers/dates.
```sql
-- WRONG: lexicographic string comparison
WHERE data->>'$.price' <= '1200'
-- CORRECT: cast to numeric
WHERE CAST(data->>'$.price' AS UNSIGNED) <= 1200
```
- **Collation**: values extracted with `->>` behave like strings and use a collation. Use `COLLATE` when you need a specific comparison behavior.
```sql
WHERE data->>'$.status' COLLATE utf8mb4_0900_as_cs = 'Active'
```
## Common Pitfalls
- **Heavy update cost**: `JSON_SET`/`JSON_REPLACE` can touch large portions of a JSON document and generate significant redo/undo work on large blobs.
- **No partial indexes**: You can only index extracted scalar paths via generated columns.
- **Large documents hurt**: JSON stored inline in the row. Documents >8 KB spill to overflow pages, hurting read performance.
- **Type mismatches**: `JSON_EXTRACT` returns a JSON type. Comparing with `= 'foo'` may not match — use `->>` or `JSON_UNQUOTE`.
- **VIRTUAL vs STORED generated columns**: VIRTUAL columns compute on read (less storage, more CPU). STORED columns materialize on write (more storage, faster reads if selected often). Both can be indexed; for indexed paths, the index stores the computed value either way.