Files
CleanArchitecture-template/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/index_strategy_patterns.md
2026-03-12 15:17:52 +07:00

12 KiB

Index Strategy Patterns

Overview

Database indexes are critical for query performance, but they come with trade-offs. This guide covers proven patterns for index design, optimization strategies, and common pitfalls to avoid.

Index Types and Use Cases

B-Tree Indexes (Default)

Best For:

  • Equality queries (WHERE column = value)
  • Range queries (WHERE column BETWEEN x AND y)
  • Sorting (ORDER BY column)
  • Pattern matching with leading wildcards (WHERE column LIKE 'prefix%')

Characteristics:

  • Logarithmic lookup time O(log n)
  • Supports partial matches on composite indexes
  • Most versatile index type

Example:

-- Single column B-tree index
CREATE INDEX idx_customers_email ON customers (email);

-- Composite B-tree index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

Hash Indexes

Best For:

  • Exact equality matches only
  • High-cardinality columns
  • Primary key lookups

Characteristics:

  • Constant lookup time O(1) for exact matches
  • Cannot support range queries or sorting
  • Memory-efficient for equality operations

Example:

-- Hash index for exact lookups (PostgreSQL)
CREATE INDEX idx_users_id_hash ON users USING HASH (user_id);

Partial Indexes

Best For:

  • Filtering on subset of data
  • Reducing index size and maintenance overhead
  • Query patterns that consistently use specific filters

Example:

-- Index only active users
CREATE INDEX idx_active_users_email 
ON users (email) 
WHERE status = 'active';

-- Index recent orders only
CREATE INDEX idx_recent_orders 
ON orders (customer_id, created_at) 
WHERE created_at > CURRENT_DATE - INTERVAL '90 days';

-- Index non-null values only
CREATE INDEX idx_customers_phone 
ON customers (phone_number) 
WHERE phone_number IS NOT NULL;

Covering Indexes

Best For:

  • Eliminating table lookups for SELECT queries
  • Frequently accessed column combinations
  • Read-heavy workloads

Example:

-- Covering index with INCLUDE clause (SQL Server/PostgreSQL)
CREATE INDEX idx_orders_customer_covering 
ON orders (customer_id, order_date) 
INCLUDE (order_total, status);

-- Query can be satisfied entirely from index:
-- SELECT order_total, status FROM orders 
-- WHERE customer_id = 123 AND order_date > '2024-01-01';

Functional/Expression Indexes

Best For:

  • Queries on transformed column values
  • Case-insensitive searches
  • Complex calculations

Example:

-- Case-insensitive email searches
CREATE INDEX idx_users_email_lower 
ON users (LOWER(email));

-- Date part extraction
CREATE INDEX idx_orders_month 
ON orders (EXTRACT(MONTH FROM order_date));

-- JSON field indexing
CREATE INDEX idx_users_preferences_theme 
ON users ((preferences->>'theme'));

Composite Index Design Patterns

Column Ordering Strategy

Rule: Most Selective First

-- Query: WHERE status = 'active' AND city = 'New York' AND age > 25
-- Assume: status has 3 values, city has 100 values, age has 80 values

-- GOOD: Most selective column first
CREATE INDEX idx_users_city_age_status ON users (city, age, status);

-- BAD: Least selective first
CREATE INDEX idx_users_status_city_age ON users (status, city, age);

Selectivity Calculation:

-- Estimate selectivity for each column
SELECT 
    'status' as column_name,
    COUNT(DISTINCT status)::float / COUNT(*) as selectivity
FROM users
UNION ALL
SELECT 
    'city' as column_name,
    COUNT(DISTINCT city)::float / COUNT(*) as selectivity
FROM users
UNION ALL
SELECT 
    'age' as column_name,
    COUNT(DISTINCT age)::float / COUNT(*) as selectivity
FROM users;

Query Pattern Matching

Pattern 1: Equality + Range

-- Query: WHERE customer_id = 123 AND order_date BETWEEN '2024-01-01' AND '2024-03-31'
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

Pattern 2: Multiple Equality Conditions

-- Query: WHERE status = 'active' AND category = 'premium' AND region = 'US'
CREATE INDEX idx_users_status_category_region ON users (status, category, region);

Pattern 3: Equality + Sorting

-- Query: WHERE category = 'electronics' ORDER BY price DESC, created_at DESC
CREATE INDEX idx_products_category_price_date ON products (category, price DESC, created_at DESC);

Prefix Optimization

Efficient Prefix Usage:

-- Index supports all these queries efficiently:
CREATE INDEX idx_users_lastname_firstname_email ON users (last_name, first_name, email);

-- ✓ Uses index: WHERE last_name = 'Smith'
-- ✓ Uses index: WHERE last_name = 'Smith' AND first_name = 'John'  
-- ✓ Uses index: WHERE last_name = 'Smith' AND first_name = 'John' AND email = 'john@...'
-- ✗ Cannot use index: WHERE first_name = 'John'
-- ✗ Cannot use index: WHERE email = 'john@...'

Performance Optimization Patterns

Index Intersection vs Composite Indexes

Scenario: Multiple single-column indexes

CREATE INDEX idx_users_age ON users (age);
CREATE INDEX idx_users_city ON users (city);
CREATE INDEX idx_users_status ON users (status);

-- Query: WHERE age > 25 AND city = 'NYC' AND status = 'active'
-- Database may use index intersection (combining multiple indexes)
-- Performance varies by database engine and data distribution

Better: Purpose-built composite index

-- More efficient for the specific query pattern
CREATE INDEX idx_users_city_status_age ON users (city, status, age);

Index Size vs Performance Trade-off

Wide Indexes (Many Columns):

-- Pros: Covers many query patterns, excellent for covering queries
-- Cons: Large index size, slower writes, more memory usage
CREATE INDEX idx_orders_comprehensive 
ON orders (customer_id, order_date, status, total_amount, shipping_method, created_at)
INCLUDE (order_notes, billing_address);

Narrow Indexes (Few Columns):

-- Pros: Smaller size, faster writes, less memory
-- Cons: May not cover all query patterns
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
CREATE INDEX idx_orders_status ON orders (status);

Maintenance Optimization

Regular Index Analysis:

-- PostgreSQL: Check index usage statistics
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE idx_scan = 0  -- Potentially unused indexes
ORDER BY schemaname, tablename;

-- Check index size
SELECT 
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) as index_size
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;

Common Anti-Patterns

1. Over-Indexing

Problem:

-- Too many similar indexes
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);  
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
CREATE INDEX idx_orders_customer_date_status ON orders (customer_id, order_date, status);

Solution:

-- One well-designed composite index can often replace several
CREATE INDEX idx_orders_customer_date_status ON orders (customer_id, order_date, status);
-- Drop redundant indexes: idx_orders_customer, idx_orders_customer_date, idx_orders_customer_status

2. Wrong Column Order

Problem:

-- Query: WHERE active = true AND user_type = 'premium' AND city = 'Chicago'
-- Bad order: boolean first (lowest selectivity)
CREATE INDEX idx_users_active_type_city ON users (active, user_type, city);

Solution:

-- Good order: most selective first
CREATE INDEX idx_users_city_type_active ON users (city, user_type, active);

3. Ignoring Query Patterns

Problem:

-- Index doesn't match common query patterns
CREATE INDEX idx_products_name ON products (product_name);

-- But queries are: WHERE category = 'electronics' AND price BETWEEN 100 AND 500
-- Index is not helpful for these queries

Solution:

-- Match actual query patterns
CREATE INDEX idx_products_category_price ON products (category, price);

4. Function in WHERE Without Functional Index

Problem:

-- Query uses function but no functional index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Regular index on email won't help

Solution:

-- Create functional index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

Advanced Patterns

Multi-Column Statistics

When Columns Are Correlated:

-- If city and state are highly correlated, create extended statistics
CREATE STATISTICS stats_address_correlation ON city, state FROM addresses;
ANALYZE addresses;

-- Helps query planner make better decisions for:
-- WHERE city = 'New York' AND state = 'NY'

Conditional Indexes for Data Lifecycle

Pattern: Different indexes for different data ages

-- Hot data (recent orders) - optimized for OLTP
CREATE INDEX idx_orders_hot_customer_date 
ON orders (customer_id, order_date DESC) 
WHERE order_date > CURRENT_DATE - INTERVAL '30 days';

-- Warm data (older orders) - optimized for analytics  
CREATE INDEX idx_orders_warm_date_total 
ON orders (order_date, total_amount) 
WHERE order_date <= CURRENT_DATE - INTERVAL '30 days' 
  AND order_date > CURRENT_DATE - INTERVAL '1 year';

-- Cold data (archived orders) - minimal indexing
CREATE INDEX idx_orders_cold_date 
ON orders (order_date) 
WHERE order_date <= CURRENT_DATE - INTERVAL '1 year';

Index-Only Scan Optimization

Design indexes to avoid table access:

-- Query: SELECT order_id, total_amount, status FROM orders WHERE customer_id = ?
CREATE INDEX idx_orders_customer_covering 
ON orders (customer_id) 
INCLUDE (order_id, total_amount, status);

-- Or as composite index (if database doesn't support INCLUDE)
CREATE INDEX idx_orders_customer_covering 
ON orders (customer_id, order_id, total_amount, status);

Index Monitoring and Maintenance

Performance Monitoring Queries

Find slow queries that might benefit from indexes:

-- PostgreSQL: Find queries with high cost
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    rows
FROM pg_stat_statements
WHERE mean_time > 1000  -- Queries taking > 1 second
ORDER BY mean_time DESC;

Identify missing indexes:

-- Look for sequential scans on large tables
SELECT 
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    n_tup_ins + n_tup_upd + n_tup_del as write_activity
FROM pg_stat_user_tables
WHERE seq_scan > 100 
  AND seq_tup_read > 100000  -- Large sequential scans
  AND (idx_scan = 0 OR seq_scan > idx_scan * 2)
ORDER BY seq_tup_read DESC;

Index Maintenance Schedule

Regular Maintenance Tasks:

-- Rebuild fragmented indexes (SQL Server)
ALTER INDEX ALL ON orders REBUILD;

-- Update statistics (PostgreSQL)
ANALYZE orders;

-- Check for unused indexes monthly
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Conclusion

Effective index strategy requires:

  1. Understanding Query Patterns: Analyze actual application queries, not theoretical scenarios
  2. Measuring Performance: Use query execution plans and timing to validate index effectiveness
  3. Balancing Trade-offs: More indexes improve reads but slow writes and increase storage
  4. Regular Maintenance: Monitor index usage and performance, remove unused indexes
  5. Iterative Improvement: Start with essential indexes, add and optimize based on real usage

The goal is not to index every possible query pattern, but to create a focused set of indexes that provide maximum benefit for your application's specific workload while minimizing maintenance overhead.