78 lines
3.1 KiB
Markdown
78 lines
3.1 KiB
Markdown
---
|
|
title: JSON Column Best Practices
|
|
description: When and how to use JSON columns safely
|
|
tags: mysql, json, generated-columns, indexes, data-modeling
|
|
---
|
|
|
|
# JSON Column Patterns
|
|
|
|
MySQL 5.7+ supports native JSON columns. Useful, but with important caveats.
|
|
|
|
## When JSON Is Appropriate
|
|
- Truly schema-less data (user preferences, metadata bags, webhook payloads).
|
|
- Rarely filtered/joined — if you query a JSON path frequently, extract it to a real column.
|
|
|
|
## Indexing JSON: Use Generated Columns
|
|
You **cannot** index a JSON column directly. Create a virtual generated column and index that:
|
|
```sql
|
|
ALTER TABLE events
|
|
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
|
|
ADD INDEX idx_event_type (event_type);
|
|
```
|
|
|
|
## Extraction Operators
|
|
| Syntax | Returns | Use for |
|
|
|---|---|---|
|
|
| `JSON_EXTRACT(col, '$.key')` | JSON type value (e.g., `"foo"` for strings) | When you need JSON type semantics |
|
|
| `col->'$.key'` | Same as `JSON_EXTRACT(col, '$.key')` | Shorthand |
|
|
| `col->>'$.key'` | Unquoted scalar (equivalent to `JSON_UNQUOTE(JSON_EXTRACT(col, '$.key'))`) | WHERE comparisons, display |
|
|
|
|
Always use `->>` (unquote) in WHERE clauses, otherwise you compare against `"foo"` (with quotes).
|
|
|
|
Tip: the generated column example above can be written more concisely as:
|
|
|
|
```sql
|
|
ALTER TABLE events
|
|
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
|
|
ADD INDEX idx_event_type (event_type);
|
|
```
|
|
|
|
## Multi-Valued Indexes (MySQL 8.0.17+)
|
|
If you store arrays in JSON (e.g., `tags: ["electronics","sale"]`), MySQL 8.0.17+ supports multi-valued indexes to index array elements:
|
|
|
|
```sql
|
|
ALTER TABLE products
|
|
ADD INDEX idx_tags ((CAST(tags AS CHAR(50) ARRAY)));
|
|
```
|
|
|
|
This can accelerate membership queries such as:
|
|
|
|
```sql
|
|
SELECT * FROM products WHERE 'electronics' MEMBER OF (tags);
|
|
```
|
|
|
|
## Collation and Type Casting Pitfalls
|
|
- **JSON type comparisons**: `JSON_EXTRACT` returns JSON type. Comparing directly to strings can be wrong for numbers/dates.
|
|
|
|
```sql
|
|
-- WRONG: lexicographic string comparison
|
|
WHERE data->>'$.price' <= '1200'
|
|
|
|
-- CORRECT: cast to numeric
|
|
WHERE CAST(data->>'$.price' AS UNSIGNED) <= 1200
|
|
```
|
|
|
|
- **Collation**: values extracted with `->>` behave like strings and use a collation. Use `COLLATE` when you need a specific comparison behavior.
|
|
|
|
```sql
|
|
WHERE data->>'$.status' COLLATE utf8mb4_0900_as_cs = 'Active'
|
|
```
|
|
|
|
## Common Pitfalls
|
|
- **Heavy update cost**: `JSON_SET`/`JSON_REPLACE` can touch large portions of a JSON document and generate significant redo/undo work on large blobs.
|
|
- **No partial indexes**: You can only index extracted scalar paths via generated columns.
|
|
- **Large documents hurt**: JSON stored inline in the row. Documents >8 KB spill to overflow pages, hurting read performance.
|
|
- **Type mismatches**: `JSON_EXTRACT` returns a JSON type. Comparing with `= 'foo'` may not match — use `->>` or `JSON_UNQUOTE`.
|
|
- **VIRTUAL vs STORED generated columns**: VIRTUAL columns compute on read (less storage, more CPU). STORED columns materialize on write (more storage, faster reads if selected often). Both can be indexed; for indexed paths, the index stores the computed value either way.
|
|
|