add brain

This commit is contained in:
2026-03-12 15:17:52 +07:00
parent fd9f558fa1
commit e7821a7a9d
355 changed files with 93784 additions and 24 deletions

View File

@@ -0,0 +1,46 @@
---
title: Replication Lag Awareness
description: Read-replica consistency pitfalls and mitigations
tags: mysql, replication, lag, read-replicas, consistency, gtid
---
# Replication Lag
MySQL replication is asynchronous by default. Reads from a replica may return stale data.
## The Core Problem
1. App writes to primary: `INSERT INTO orders ...`
2. App immediately reads from replica: `SELECT * FROM orders WHERE id = ?`
3. Replica hasn't applied the write yet — returns empty or stale data.
## Detecting Lag
```sql
-- On the replica
SHOW REPLICA STATUS\G
-- Key field: Seconds_Behind_Source (0 = caught up, NULL = not replicating)
```
**Warning**: `Seconds_Behind_Source` measures relay-log lag, not true wall-clock staleness. It can underreport during long-running transactions because it only updates when transactions commit.
**GTID-based lag**: for more accurate tracking, compare `@@global.gtid_executed` (replica) to primary GTID position, or use `WAIT_FOR_EXECUTED_GTID_SET()` to wait for a specific transaction.
**Note**: parallel replication with `replica_parallel_type=LOGICAL_CLOCK` requires `binlog_format=ROW`. Statement-based replication (`binlog_format=STATEMENT`) is more limited for parallel apply.
## Mitigation Strategies
| Strategy | How | Trade-off |
|---|---|---|
| **Read from primary** | Route critical reads to primary after writes | Increases primary load |
| **Sticky sessions** | Pin user to primary for N seconds after a write | Adds session affinity complexity |
| **GTID wait** | `SELECT WAIT_FOR_EXECUTED_GTID_SET('gtid', timeout)` on replica | Adds latency equal to lag |
| **Semi-sync replication** | Primary waits for >=1 replica ACK before committing | Higher write latency |
## Common Pitfalls
- **Large transactions cause lag spikes**: A single `INSERT ... SELECT` of 1M rows replays as one big transaction on the replica. Break into batches.
- **DDL blocks replication**: `ALTER TABLE` with `ALGORITHM=COPY` on primary replays on replica, blocking other relay-log events during execution. `INSTANT` and `INPLACE` DDL are less blocking but still require brief metadata locks.
- **Long queries on replica**: A slow `SELECT` on the replica can block relay-log application. Use `replica_parallel_workers` (8.0+) with `replica_parallel_type=LOGICAL_CLOCK` for parallel apply. Note: LOGICAL_CLOCK requires `binlog_format=ROW` and `slave_preserve_commit_order=ON` (or `replica_preserve_commit_order=ON`) to preserve commit order.
- **IO thread bottlenecks**: Network latency, disk I/O, or `relay_log_space_limit` exhaustion can cause lag even when the SQL apply thread isn't saturated. Monitor `Relay_Log_Space` and connectivity.
## Guidelines
- Assume replicas are always slightly behind. Design reads accordingly.
- Use GTID-based replication for reliable failover and lag tracking.
- Monitor `Seconds_Behind_Source` with alerting (>5s warrants investigation).