Files

2026-03-12 15:17:52 +07:00

2.8 KiB

Raw Blame History

title, description, tags

title	description	tags
Replication Lag Awareness	Read-replica consistency pitfalls and mitigations	mysql, replication, lag, read-replicas, consistency, gtid

Replication Lag

MySQL replication is asynchronous by default. Reads from a replica may return stale data.

The Core Problem

App writes to primary: INSERT INTO orders ...
App immediately reads from replica: SELECT * FROM orders WHERE id = ?
Replica hasn't applied the write yet — returns empty or stale data.

Detecting Lag

-- On the replica
SHOW REPLICA STATUS\G
-- Key field: Seconds_Behind_Source (0 = caught up, NULL = not replicating)

Warning: Seconds_Behind_Source measures relay-log lag, not true wall-clock staleness. It can underreport during long-running transactions because it only updates when transactions commit.

GTID-based lag: for more accurate tracking, compare @@global.gtid_executed (replica) to primary GTID position, or use WAIT_FOR_EXECUTED_GTID_SET() to wait for a specific transaction.

Note: parallel replication with replica_parallel_type=LOGICAL_CLOCK requires binlog_format=ROW. Statement-based replication (binlog_format=STATEMENT) is more limited for parallel apply.

Mitigation Strategies

Strategy	How	Trade-off
Read from primary	Route critical reads to primary after writes	Increases primary load
Sticky sessions	Pin user to primary for N seconds after a write	Adds session affinity complexity
GTID wait	`SELECT WAIT_FOR_EXECUTED_GTID_SET('gtid', timeout)` on replica	Adds latency equal to lag
Semi-sync replication	Primary waits for >=1 replica ACK before committing	Higher write latency

Common Pitfalls

Large transactions cause lag spikes: A single INSERT ... SELECT of 1M rows replays as one big transaction on the replica. Break into batches.
DDL blocks replication: ALTER TABLE with ALGORITHM=COPY on primary replays on replica, blocking other relay-log events during execution. INSTANT and INPLACE DDL are less blocking but still require brief metadata locks.
Long queries on replica: A slow SELECT on the replica can block relay-log application. Use replica_parallel_workers (8.0+) with replica_parallel_type=LOGICAL_CLOCK for parallel apply. Note: LOGICAL_CLOCK requires binlog_format=ROW and slave_preserve_commit_order=ON (or replica_preserve_commit_order=ON) to preserve commit order.
IO thread bottlenecks: Network latency, disk I/O, or relay_log_space_limit exhaustion can cause lag even when the SQL apply thread isn't saturated. Monitor Relay_Log_Space and connectivity.

Guidelines

Assume replicas are always slightly behind. Design reads accordingly.
Use GTID-based replication for reliable failover and lag tracking.
Monitor Seconds_Behind_Source with alerting (>5s warrants investigation).

2.8 KiB Raw Blame History