2.8 KiB
2.8 KiB
title, description, tags
| title | description | tags |
|---|---|---|
| Replication Lag Awareness | Read-replica consistency pitfalls and mitigations | mysql, replication, lag, read-replicas, consistency, gtid |
Replication Lag
MySQL replication is asynchronous by default. Reads from a replica may return stale data.
The Core Problem
- App writes to primary:
INSERT INTO orders ... - App immediately reads from replica:
SELECT * FROM orders WHERE id = ? - Replica hasn't applied the write yet — returns empty or stale data.
Detecting Lag
-- On the replica
SHOW REPLICA STATUS\G
-- Key field: Seconds_Behind_Source (0 = caught up, NULL = not replicating)
Warning: Seconds_Behind_Source measures relay-log lag, not true wall-clock staleness. It can underreport during long-running transactions because it only updates when transactions commit.
GTID-based lag: for more accurate tracking, compare @@global.gtid_executed (replica) to primary GTID position, or use WAIT_FOR_EXECUTED_GTID_SET() to wait for a specific transaction.
Note: parallel replication with replica_parallel_type=LOGICAL_CLOCK requires binlog_format=ROW. Statement-based replication (binlog_format=STATEMENT) is more limited for parallel apply.
Mitigation Strategies
| Strategy | How | Trade-off |
|---|---|---|
| Read from primary | Route critical reads to primary after writes | Increases primary load |
| Sticky sessions | Pin user to primary for N seconds after a write | Adds session affinity complexity |
| GTID wait | SELECT WAIT_FOR_EXECUTED_GTID_SET('gtid', timeout) on replica |
Adds latency equal to lag |
| Semi-sync replication | Primary waits for >=1 replica ACK before committing | Higher write latency |
Common Pitfalls
- Large transactions cause lag spikes: A single
INSERT ... SELECTof 1M rows replays as one big transaction on the replica. Break into batches. - DDL blocks replication:
ALTER TABLEwithALGORITHM=COPYon primary replays on replica, blocking other relay-log events during execution.INSTANTandINPLACEDDL are less blocking but still require brief metadata locks. - Long queries on replica: A slow
SELECTon the replica can block relay-log application. Usereplica_parallel_workers(8.0+) withreplica_parallel_type=LOGICAL_CLOCKfor parallel apply. Note: LOGICAL_CLOCK requiresbinlog_format=ROWandslave_preserve_commit_order=ON(orreplica_preserve_commit_order=ON) to preserve commit order. - IO thread bottlenecks: Network latency, disk I/O, or
relay_log_space_limitexhaustion can cause lag even when the SQL apply thread isn't saturated. MonitorRelay_Log_Spaceand connectivity.
Guidelines
- Assume replicas are always slightly behind. Design reads accordingly.
- Use GTID-based replication for reliable failover and lag tracking.
- Monitor
Seconds_Behind_Sourcewith alerting (>5s warrants investigation).