3.1 KiB
3.1 KiB
title, description, tags
| title | description | tags |
|---|---|---|
| JSON Column Best Practices | When and how to use JSON columns safely | mysql, json, generated-columns, indexes, data-modeling |
JSON Column Patterns
MySQL 5.7+ supports native JSON columns. Useful, but with important caveats.
When JSON Is Appropriate
- Truly schema-less data (user preferences, metadata bags, webhook payloads).
- Rarely filtered/joined — if you query a JSON path frequently, extract it to a real column.
Indexing JSON: Use Generated Columns
You cannot index a JSON column directly. Create a virtual generated column and index that:
ALTER TABLE events
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
ADD INDEX idx_event_type (event_type);
Extraction Operators
| Syntax | Returns | Use for |
|---|---|---|
JSON_EXTRACT(col, '$.key') |
JSON type value (e.g., "foo" for strings) |
When you need JSON type semantics |
col->'$.key' |
Same as JSON_EXTRACT(col, '$.key') |
Shorthand |
col->>'$.key' |
Unquoted scalar (equivalent to JSON_UNQUOTE(JSON_EXTRACT(col, '$.key'))) |
WHERE comparisons, display |
Always use ->> (unquote) in WHERE clauses, otherwise you compare against "foo" (with quotes).
Tip: the generated column example above can be written more concisely as:
ALTER TABLE events
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
ADD INDEX idx_event_type (event_type);
Multi-Valued Indexes (MySQL 8.0.17+)
If you store arrays in JSON (e.g., tags: ["electronics","sale"]), MySQL 8.0.17+ supports multi-valued indexes to index array elements:
ALTER TABLE products
ADD INDEX idx_tags ((CAST(tags AS CHAR(50) ARRAY)));
This can accelerate membership queries such as:
SELECT * FROM products WHERE 'electronics' MEMBER OF (tags);
Collation and Type Casting Pitfalls
- JSON type comparisons:
JSON_EXTRACTreturns JSON type. Comparing directly to strings can be wrong for numbers/dates.
-- WRONG: lexicographic string comparison
WHERE data->>'$.price' <= '1200'
-- CORRECT: cast to numeric
WHERE CAST(data->>'$.price' AS UNSIGNED) <= 1200
- Collation: values extracted with
->>behave like strings and use a collation. UseCOLLATEwhen you need a specific comparison behavior.
WHERE data->>'$.status' COLLATE utf8mb4_0900_as_cs = 'Active'
Common Pitfalls
- Heavy update cost:
JSON_SET/JSON_REPLACEcan touch large portions of a JSON document and generate significant redo/undo work on large blobs. - No partial indexes: You can only index extracted scalar paths via generated columns.
- Large documents hurt: JSON stored inline in the row. Documents >8 KB spill to overflow pages, hurting read performance.
- Type mismatches:
JSON_EXTRACTreturns a JSON type. Comparing with= 'foo'may not match — use->>orJSON_UNQUOTE. - VIRTUAL vs STORED generated columns: VIRTUAL columns compute on read (less storage, more CPU). STORED columns materialize on write (more storage, faster reads if selected often). Both can be indexed; for indexed paths, the index stores the computed value either way.