add brain

2026-03-12 15:17:52 +07:00
parent fd9f558fa1
commit e7821a7a9d
355 changed files with 93784 additions and 24 deletions
--- a/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/database-design-reference.md
+++ b/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/database-design-reference.md
@@ -0,0 +1,476 @@
+# database-designer reference
+
+## Database Design Principles
+
+### Normalization Forms
+
+#### First Normal Form (1NF)
+- **Atomic Values**: Each column contains indivisible values
+- **Unique Column Names**: No duplicate column names within a table
+- **Uniform Data Types**: Each column contains the same type of data
+- **Row Uniqueness**: No duplicate rows in the table
+
+**Example Violation:**
+```sql
+-- BAD: Multiple phone numbers in one column
+CREATE TABLE contacts (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    phones VARCHAR(200)  -- "123-456-7890, 098-765-4321"
+);
+
+-- GOOD: Separate table for phone numbers
+CREATE TABLE contacts (
+    id INT PRIMARY KEY,
+    name VARCHAR(100)
+);
+
+CREATE TABLE contact_phones (
+    id INT PRIMARY KEY,
+    contact_id INT REFERENCES contacts(id),
+    phone_number VARCHAR(20),
+    phone_type VARCHAR(10)
+);
+```
+
+#### Second Normal Form (2NF)
+- **1NF Compliance**: Must satisfy First Normal Form
+- **Full Functional Dependency**: Non-key attributes depend on the entire primary key
+- **Partial Dependency Elimination**: Remove attributes that depend on part of a composite key
+
+**Example Violation:**
+```sql
+-- BAD: Student course table with partial dependencies
+CREATE TABLE student_courses (
+    student_id INT,
+    course_id INT,
+    student_name VARCHAR(100),  -- Depends only on student_id
+    course_name VARCHAR(100),   -- Depends only on course_id
+    grade CHAR(1),
+    PRIMARY KEY (student_id, course_id)
+);
+
+-- GOOD: Separate tables eliminate partial dependencies
+CREATE TABLE students (
+    id INT PRIMARY KEY,
+    name VARCHAR(100)
+);
+
+CREATE TABLE courses (
+    id INT PRIMARY KEY,
+    name VARCHAR(100)
+);
+
+CREATE TABLE enrollments (
+    student_id INT REFERENCES students(id),
+    course_id INT REFERENCES courses(id),
+    grade CHAR(1),
+    PRIMARY KEY (student_id, course_id)
+);
+```
+
+#### Third Normal Form (3NF)
+- **2NF Compliance**: Must satisfy Second Normal Form
+- **Transitive Dependency Elimination**: Non-key attributes should not depend on other non-key attributes
+- **Direct Dependency**: Non-key attributes depend directly on the primary key
+
+**Example Violation:**
+```sql
+-- BAD: Employee table with transitive dependency
+CREATE TABLE employees (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    department_id INT,
+    department_name VARCHAR(100),  -- Depends on department_id, not employee id
+    department_budget DECIMAL(10,2) -- Transitive dependency
+);
+
+-- GOOD: Separate department information
+CREATE TABLE departments (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    budget DECIMAL(10,2)
+);
+
+CREATE TABLE employees (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    department_id INT REFERENCES departments(id)
+);
+```
+
+#### Boyce-Codd Normal Form (BCNF)
+- **3NF Compliance**: Must satisfy Third Normal Form
+- **Determinant Key Rule**: Every determinant must be a candidate key
+- **Stricter 3NF**: Handles anomalies not covered by 3NF
+
+### Denormalization Strategies
+
+#### When to Denormalize
+1. **Read-Heavy Workloads**: High query frequency with acceptable write trade-offs
+2. **Performance Bottlenecks**: Join operations causing significant latency
+3. **Aggregation Needs**: Frequent calculation of derived values
+4. **Caching Requirements**: Pre-computed results for common queries
+
+#### Common Denormalization Patterns
+
+**Redundant Storage**
+```sql
+-- Store calculated values to avoid expensive joins
+CREATE TABLE orders (
+    id INT PRIMARY KEY,
+    customer_id INT REFERENCES customers(id),
+    customer_name VARCHAR(100), -- Denormalized from customers table
+    order_total DECIMAL(10,2),  -- Denormalized calculation
+    created_at TIMESTAMP
+);
+```
+
+**Materialized Aggregates**
+```sql
+-- Pre-computed summary tables
+CREATE TABLE customer_statistics (
+    customer_id INT PRIMARY KEY,
+    total_orders INT,
+    lifetime_value DECIMAL(12,2),
+    last_order_date DATE,
+    updated_at TIMESTAMP
+);
+```
+
+## Index Optimization Strategies
+
+### B-Tree Indexes
+- **Default Choice**: Best for range queries, sorting, and equality matches
+- **Column Order**: Most selective columns first for composite indexes
+- **Prefix Matching**: Supports leading column subset queries
+- **Maintenance Cost**: Balanced tree structure with logarithmic operations
+
+### Hash Indexes
+- **Equality Queries**: Optimal for exact match lookups
+- **Memory Efficiency**: Constant-time access for single-value queries
+- **Range Limitations**: Cannot support range or partial matches
+- **Use Cases**: Primary keys, unique constraints, cache keys
+
+### Composite Indexes
+```sql
+-- Query pattern determines optimal column order
+-- Query: WHERE status = 'active' AND created_date > '2023-01-01' ORDER BY priority DESC
+CREATE INDEX idx_task_status_date_priority 
+ON tasks (status, created_date, priority DESC);
+
+-- Query: WHERE user_id = 123 AND category IN ('A', 'B') AND date_field BETWEEN '...' AND '...'
+CREATE INDEX idx_user_category_date 
+ON user_activities (user_id, category, date_field);
+```
+
+### Covering Indexes
+```sql
+-- Include additional columns to avoid table lookups
+CREATE INDEX idx_user_email_covering 
+ON users (email) 
+INCLUDE (first_name, last_name, status);
+
+-- Query can be satisfied entirely from the index
+-- SELECT first_name, last_name, status FROM users WHERE email = 'user@example.com';
+```
+
+### Partial Indexes
+```sql
+-- Index only relevant subset of data
+CREATE INDEX idx_active_users_email 
+ON users (email) 
+WHERE status = 'active';
+
+-- Index for recent orders only
+CREATE INDEX idx_recent_orders_customer 
+ON orders (customer_id, created_at) 
+WHERE created_at > CURRENT_DATE - INTERVAL '30 days';
+```
+
+## Query Analysis & Optimization
+
+### Query Patterns Recognition
+1. **Equality Filters**: Single-column B-tree indexes
+2. **Range Queries**: B-tree with proper column ordering
+3. **Text Search**: Full-text indexes or trigram indexes
+4. **Join Operations**: Foreign key indexes on both sides
+5. **Sorting Requirements**: Indexes matching ORDER BY clauses
+
+### Index Selection Algorithm
+```
+1. Identify WHERE clause columns
+2. Determine most selective columns first
+3. Consider JOIN conditions
+4. Include ORDER BY columns if possible
+5. Evaluate covering index opportunities
+6. Check for existing overlapping indexes
+```
+
+## Data Modeling Patterns
+
+### Star Schema (Data Warehousing)
+```sql
+-- Central fact table
+CREATE TABLE sales_facts (
+    sale_id BIGINT PRIMARY KEY,
+    product_id INT REFERENCES products(id),
+    customer_id INT REFERENCES customers(id),
+    date_id INT REFERENCES date_dimension(id),
+    store_id INT REFERENCES stores(id),
+    quantity INT,
+    unit_price DECIMAL(8,2),
+    total_amount DECIMAL(10,2)
+);
+
+-- Dimension tables
+CREATE TABLE date_dimension (
+    id INT PRIMARY KEY,
+    date_value DATE,
+    year INT,
+    quarter INT,
+    month INT,
+    day_of_week INT,
+    is_weekend BOOLEAN
+);
+```
+
+### Snowflake Schema
+```sql
+-- Normalized dimension tables
+CREATE TABLE products (
+    id INT PRIMARY KEY,
+    name VARCHAR(200),
+    category_id INT REFERENCES product_categories(id),
+    brand_id INT REFERENCES brands(id)
+);
+
+CREATE TABLE product_categories (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    parent_category_id INT REFERENCES product_categories(id)
+);
+```
+
+### Document Model (JSON Storage)
+```sql
+-- Flexible document storage with indexing
+CREATE TABLE documents (
+    id UUID PRIMARY KEY,
+    document_type VARCHAR(50),
+    data JSONB,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW()
+);
+
+-- Index on JSON properties
+CREATE INDEX idx_documents_user_id 
+ON documents USING GIN ((data->>'user_id'));
+
+CREATE INDEX idx_documents_status 
+ON documents ((data->>'status')) 
+WHERE document_type = 'order';
+```
+
+### Graph Data Patterns
+```sql
+-- Adjacency list for hierarchical data
+CREATE TABLE categories (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    parent_id INT REFERENCES categories(id),
+    level INT,
+    path VARCHAR(500)  -- Materialized path: "/1/5/12/"
+);
+
+-- Many-to-many relationships
+CREATE TABLE relationships (
+    id UUID PRIMARY KEY,
+    from_entity_id UUID,
+    to_entity_id UUID,
+    relationship_type VARCHAR(50),
+    created_at TIMESTAMP,
+    INDEX (from_entity_id, relationship_type),
+    INDEX (to_entity_id, relationship_type)
+);
+```
+
+## Migration Strategies
+
+### Zero-Downtime Migration (Expand-Contract Pattern)
+
+**Phase 1: Expand**
+```sql
+-- Add new column without constraints
+ALTER TABLE users ADD COLUMN new_email VARCHAR(255);
+
+-- Backfill data in batches
+UPDATE users SET new_email = email WHERE id BETWEEN 1 AND 1000;
+-- Continue in batches...
+
+-- Add constraints after backfill
+ALTER TABLE users ADD CONSTRAINT users_new_email_unique UNIQUE (new_email);
+ALTER TABLE users ALTER COLUMN new_email SET NOT NULL;
+```
+
+**Phase 2: Contract**
+```sql
+-- Update application to use new column
+-- Deploy application changes
+-- Verify new column is being used
+
+-- Remove old column
+ALTER TABLE users DROP COLUMN email;
+-- Rename new column
+ALTER TABLE users RENAME COLUMN new_email TO email;
+```
+
+### Data Type Changes
+```sql
+-- Safe string to integer conversion
+ALTER TABLE products ADD COLUMN sku_number INTEGER;
+UPDATE products SET sku_number = CAST(sku AS INTEGER) WHERE sku ~ '^[0-9]+$';
+-- Validate conversion success before dropping old column
+```
+
+## Partitioning Strategies
+
+### Horizontal Partitioning (Sharding)
+```sql
+-- Range partitioning by date
+CREATE TABLE sales_2023 PARTITION OF sales
+FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
+
+CREATE TABLE sales_2024 PARTITION OF sales
+FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
+
+-- Hash partitioning by user_id
+CREATE TABLE user_data_0 PARTITION OF user_data
+FOR VALUES WITH (MODULUS 4, REMAINDER 0);
+
+CREATE TABLE user_data_1 PARTITION OF user_data
+FOR VALUES WITH (MODULUS 4, REMAINDER 1);
+```
+
+### Vertical Partitioning
+```sql
+-- Separate frequently accessed columns
+CREATE TABLE users_core (
+    id INT PRIMARY KEY,
+    email VARCHAR(255),
+    status VARCHAR(20),
+    created_at TIMESTAMP
+);
+
+-- Less frequently accessed profile data
+CREATE TABLE users_profile (
+    user_id INT PRIMARY KEY REFERENCES users_core(id),
+    bio TEXT,
+    preferences JSONB,
+    last_login TIMESTAMP
+);
+```
+
+## Connection Management
+
+### Connection Pooling
+- **Pool Size**: CPU cores × 2 + effective spindle count
+- **Connection Lifetime**: Rotate connections to prevent resource leaks
+- **Timeout Settings**: Connection, idle, and query timeouts
+- **Health Checks**: Regular connection validation
+
+### Read Replicas Strategy
+```sql
+-- Write queries to primary
+INSERT INTO users (email, name) VALUES ('user@example.com', 'John Doe');
+
+-- Read queries to replicas (with appropriate read preference)
+SELECT * FROM users WHERE status = 'active';  -- Route to read replica
+
+-- Consistent reads when required
+SELECT * FROM users WHERE id = LAST_INSERT_ID();  -- Route to primary
+```
+
+## Caching Layers
+
+### Cache-Aside Pattern
+```python
+def get_user(user_id):
+    # Try cache first
+    user = cache.get(f"user:{user_id}")
+    if user is None:
+        # Cache miss - query database
+        user = db.query("SELECT * FROM users WHERE id = %s", user_id)
+        # Store in cache
+        cache.set(f"user:{user_id}", user, ttl=3600)
+    return user
+```
+
+### Write-Through Cache
+- **Consistency**: Always keep cache and database in sync
+- **Write Latency**: Higher due to dual writes
+- **Data Safety**: No data loss on cache failures
+
+### Cache Invalidation Strategies
+1. **TTL-Based**: Time-based expiration
+2. **Event-Driven**: Invalidate on data changes
+3. **Version-Based**: Use version numbers for consistency
+4. **Tag-Based**: Group related cache entries
+
+## Database Selection Guide
+
+### SQL Databases
+**PostgreSQL**
+- **Strengths**: ACID compliance, complex queries, JSON support, extensibility
+- **Use Cases**: OLTP applications, data warehousing, geospatial data
+- **Scale**: Vertical scaling with read replicas
+
+**MySQL**
+- **Strengths**: Performance, replication, wide ecosystem support
+- **Use Cases**: Web applications, content management, e-commerce
+- **Scale**: Horizontal scaling through sharding
+
+### NoSQL Databases
+
+**Document Stores (MongoDB, CouchDB)**
+- **Strengths**: Flexible schema, horizontal scaling, developer productivity
+- **Use Cases**: Content management, catalogs, user profiles
+- **Trade-offs**: Eventual consistency, complex queries limitations
+
+**Key-Value Stores (Redis, DynamoDB)**
+- **Strengths**: High performance, simple model, excellent caching
+- **Use Cases**: Session storage, real-time analytics, gaming leaderboards
+- **Trade-offs**: Limited query capabilities, data modeling constraints
+
+**Column-Family (Cassandra, HBase)**
+- **Strengths**: Write-heavy workloads, linear scalability, fault tolerance
+- **Use Cases**: Time-series data, IoT applications, messaging systems
+- **Trade-offs**: Query flexibility, consistency model complexity
+
+**Graph Databases (Neo4j, Amazon Neptune)**
+- **Strengths**: Relationship queries, pattern matching, recommendation engines
+- **Use Cases**: Social networks, fraud detection, knowledge graphs
+- **Trade-offs**: Specialized use cases, learning curve
+
+### NewSQL Databases
+**Distributed SQL (CockroachDB, TiDB, Spanner)**
+- **Strengths**: SQL compatibility with horizontal scaling
+- **Use Cases**: Global applications requiring ACID guarantees
+- **Trade-offs**: Complexity, latency for distributed transactions
+
+## Tools & Scripts
+
+### Schema Analyzer
+- **Input**: SQL DDL files, JSON schema definitions
+- **Analysis**: Normalization compliance, constraint validation, naming conventions
+- **Output**: Analysis report, Mermaid ERD, improvement recommendations
+
+### Index Optimizer
+- **Input**: Schema definition, query patterns
+- **Analysis**: Missing indexes, redundancy detection, selectivity estimation
+- **Output**: Index recommendations, CREATE INDEX statements, performance projections
+
+### Migration Generator
+- **Input**: Current and target schemas
+- **Analysis**: Schema differences, dependency resolution, risk assessment
+- **Output**: Migration scripts, rollback plans, validation queries
--- a/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/database_selection_decision_tree.md
+++ b/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/database_selection_decision_tree.md
@@ -0,0 +1,373 @@
+# Database Selection Decision Tree
+
+## Overview
+
+Choosing the right database technology is crucial for application success. This guide provides a systematic approach to database selection based on specific requirements, data patterns, and operational constraints.
+
+## Decision Framework
+
+### Primary Questions
+
+1. **What is your primary use case?**
+   - OLTP (Online Transaction Processing)
+   - OLAP (Online Analytical Processing)  
+   - Real-time analytics
+   - Content management
+   - Search and discovery
+   - Time-series data
+   - Graph relationships
+
+2. **What are your consistency requirements?**
+   - Strong consistency (ACID)
+   - Eventual consistency
+   - Causal consistency
+   - Session consistency
+
+3. **What are your scalability needs?**
+   - Vertical scaling sufficient
+   - Horizontal scaling required
+   - Global distribution needed
+   - Multi-region requirements
+
+4. **What is your data structure?**
+   - Structured (relational)
+   - Semi-structured (JSON/XML)
+   - Unstructured (documents, media)
+   - Graph relationships
+   - Time-series data
+   - Key-value pairs
+
+## Decision Tree
+
+```
+START: What is your primary use case?
+│
+├── OLTP (Transactional Applications)
+│   │
+│   ├── Do you need strong ACID guarantees?
+│   │   ├── YES → Do you need horizontal scaling?
+│   │   │   ├── YES → Distributed SQL
+│   │   │   │   ├── CockroachDB (Global, multi-region)
+│   │   │   │   ├── TiDB (MySQL compatibility)
+│   │   │   │   └── Spanner (Google Cloud)
+│   │   │   └── NO → Traditional SQL
+│   │   │       ├── PostgreSQL (Feature-rich, extensions)
+│   │   │       ├── MySQL (Performance, ecosystem)
+│   │   │       └── SQL Server (Microsoft stack)
+│   │   └── NO → Are you primarily key-value access?
+│   │       ├── YES → Key-Value Stores
+│   │       │   ├── Redis (In-memory, caching)
+│   │       │   ├── DynamoDB (AWS managed)
+│   │       │   └── Cassandra (High availability)
+│   │       └── NO → Document Stores
+│   │           ├── MongoDB (General purpose)
+│   │           ├── CouchDB (Sync, replication)
+│   │           └── Amazon DocumentDB (MongoDB compatible)
+│   │
+├── OLAP (Analytics and Reporting)
+│   │
+│   ├── What is your data volume?
+│   │   ├── Small to Medium (< 1TB) → Traditional SQL with optimization
+│   │   │   ├── PostgreSQL with columnar extensions
+│   │   │   ├── MySQL with analytics engine
+│   │   │   └── SQL Server with columnstore
+│   │   ├── Large (1TB - 100TB) → Data Warehouse Solutions
+│   │   │   ├── Snowflake (Cloud-native)
+│   │   │   ├── BigQuery (Google Cloud)
+│   │   │   ├── Redshift (AWS)
+│   │   │   └── Synapse (Azure)
+│   │   └── Very Large (> 100TB) → Big Data Platforms
+│   │       ├── Databricks (Unified analytics)
+│   │       ├── Apache Spark on cloud
+│   │       └── Hadoop ecosystem
+│   │
+├── Real-time Analytics
+│   │
+│   ├── Do you need sub-second query responses?
+│   │   ├── YES → Stream Processing + OLAP
+│   │   │   ├── ClickHouse (Fast analytics)
+│   │   │   ├── Apache Druid (Real-time OLAP)
+│   │   │   ├── Pinot (LinkedIn's real-time DB)
+│   │   │   └── TimescaleDB (Time-series)
+│   │   └── NO → Traditional OLAP solutions
+│   │
+├── Search and Discovery
+│   │
+│   ├── What type of search?
+│   │   ├── Full-text search → Search Engines
+│   │   │   ├── Elasticsearch (Full-featured)
+│   │   │   ├── OpenSearch (AWS fork of ES)
+│   │   │   └── Solr (Apache Lucene-based)
+│   │   ├── Vector/similarity search → Vector Databases
+│   │   │   ├── Pinecone (Managed vector DB)
+│   │   │   ├── Weaviate (Open source)
+│   │   │   ├── Chroma (Embeddings)
+│   │   │   └── PostgreSQL with pgvector
+│   │   └── Faceted search → Search + SQL combination
+│   │
+├── Graph Relationships
+│   │
+│   ├── Do you need complex graph traversals?
+│   │   ├── YES → Graph Databases
+│   │   │   ├── Neo4j (Property graph)
+│   │   │   ├── Amazon Neptune (Multi-model)
+│   │   │   ├── ArangoDB (Multi-model)
+│   │   │   └── TigerGraph (Analytics focused)
+│   │   └── NO → SQL with recursive queries
+│   │       └── PostgreSQL with recursive CTEs
+│   │
+└── Time-series Data
+    │
+    ├── What is your write volume?
+        ├── High (millions/sec) → Specialized Time-series
+        │   ├── InfluxDB (Purpose-built)
+        │   ├── TimescaleDB (PostgreSQL extension)
+        │   ├── Apache Druid (Analytics focused)
+        │   └── Prometheus (Monitoring)
+        └── Medium → SQL with time-series optimization
+            └── PostgreSQL with partitioning
+```
+
+## Database Categories Deep Dive
+
+### Traditional SQL Databases
+
+**PostgreSQL**
+- **Best For**: Complex queries, JSON data, extensions, geospatial
+- **Strengths**: Feature-rich, reliable, strong consistency, extensible
+- **Use Cases**: OLTP, mixed workloads, JSON documents, geospatial applications
+- **Scaling**: Vertical scaling, read replicas, partitioning
+- **When to Choose**: Need SQL features, complex queries, moderate scale
+
+**MySQL**
+- **Best For**: Web applications, read-heavy workloads, simple schema
+- **Strengths**: Performance, replication, large ecosystem
+- **Use Cases**: Web apps, content management, e-commerce
+- **Scaling**: Read replicas, sharding, clustering (MySQL Cluster)
+- **When to Choose**: Simple schema, performance priority, large community
+
+**SQL Server**
+- **Best For**: Microsoft ecosystem, enterprise features, business intelligence
+- **Strengths**: Integration, tooling, enterprise features
+- **Use Cases**: Enterprise applications, .NET applications, BI
+- **Scaling**: Always On availability groups, partitioning
+- **When to Choose**: Microsoft stack, enterprise requirements
+
+### Distributed SQL (NewSQL)
+
+**CockroachDB**
+- **Best For**: Global applications, strong consistency, horizontal scaling
+- **Strengths**: ACID guarantees, automatic scaling, survival
+- **Use Cases**: Multi-region apps, financial services, global SaaS
+- **Trade-offs**: Complex setup, higher latency for global transactions
+- **When to Choose**: Need SQL + global scale + consistency
+
+**TiDB**
+- **Best For**: MySQL compatibility with horizontal scaling
+- **Strengths**: MySQL protocol, HTAP (hybrid), cloud-native
+- **Use Cases**: MySQL migrations, hybrid workloads
+- **When to Choose**: Existing MySQL expertise, need scale
+
+### NoSQL Document Stores
+
+**MongoDB**
+- **Best For**: Flexible schema, rapid development, document-centric data
+- **Strengths**: Developer experience, flexible schema, rich queries
+- **Use Cases**: Content management, catalogs, user profiles, IoT
+- **Scaling**: Automatic sharding, replica sets
+- **When to Choose**: Schema evolution, document structure, rapid development
+
+**CouchDB**
+- **Best For**: Offline-first applications, multi-master replication
+- **Strengths**: HTTP API, replication, conflict resolution
+- **Use Cases**: Mobile apps, distributed systems, offline scenarios
+- **When to Choose**: Need offline capabilities, bi-directional sync
+
+### Key-Value Stores
+
+**Redis**
+- **Best For**: Caching, sessions, real-time applications, pub/sub
+- **Strengths**: Performance, data structures, persistence options
+- **Use Cases**: Caching, leaderboards, real-time analytics, queues
+- **Scaling**: Clustering, sentinel for HA
+- **When to Choose**: High performance, simple data model, caching
+
+**DynamoDB**
+- **Best For**: Serverless applications, predictable performance, AWS ecosystem
+- **Strengths**: Managed, auto-scaling, consistent performance
+- **Use Cases**: Web applications, gaming, IoT, mobile backends
+- **Trade-offs**: Vendor lock-in, limited querying
+- **When to Choose**: AWS ecosystem, serverless, managed solution
+
+### Column-Family Stores
+
+**Cassandra**
+- **Best For**: Write-heavy workloads, high availability, linear scalability
+- **Strengths**: No single point of failure, tunable consistency
+- **Use Cases**: Time-series, IoT, messaging, activity feeds
+- **Trade-offs**: Complex operations, eventual consistency
+- **When to Choose**: High write volume, availability over consistency
+
+**HBase**
+- **Best For**: Big data applications, Hadoop ecosystem
+- **Strengths**: Hadoop integration, consistent reads
+- **Use Cases**: Analytics on big data, time-series at scale
+- **When to Choose**: Hadoop ecosystem, very large datasets
+
+### Graph Databases
+
+**Neo4j**
+- **Best For**: Complex relationships, graph algorithms, traversals
+- **Strengths**: Mature ecosystem, Cypher query language, algorithms
+- **Use Cases**: Social networks, recommendation engines, fraud detection
+- **Trade-offs**: Specialized use case, learning curve
+- **When to Choose**: Relationship-heavy data, graph algorithms
+
+### Time-Series Databases
+
+**InfluxDB**
+- **Best For**: Time-series data, IoT, monitoring, analytics
+- **Strengths**: Purpose-built, efficient storage, query language
+- **Use Cases**: IoT sensors, monitoring, DevOps metrics
+- **When to Choose**: High-volume time-series data
+
+**TimescaleDB**
+- **Best For**: Time-series with SQL familiarity
+- **Strengths**: PostgreSQL compatibility, SQL queries, ecosystem
+- **Use Cases**: Financial data, IoT with complex queries
+- **When to Choose**: Time-series + SQL requirements
+
+### Search Engines
+
+**Elasticsearch**
+- **Best For**: Full-text search, log analysis, real-time search
+- **Strengths**: Powerful search, analytics, ecosystem (ELK stack)
+- **Use Cases**: Search applications, log analysis, monitoring
+- **Trade-offs**: Complex operations, resource intensive
+- **When to Choose**: Advanced search requirements, analytics
+
+### Data Warehouses
+
+**Snowflake**
+- **Best For**: Cloud-native analytics, data sharing, varied workloads
+- **Strengths**: Separation of compute/storage, automatic scaling
+- **Use Cases**: Data warehousing, analytics, data science
+- **When to Choose**: Cloud-native, analytics-focused, multi-cloud
+
+**BigQuery**
+- **Best For**: Serverless analytics, Google ecosystem, machine learning
+- **Strengths**: Serverless, petabyte scale, ML integration
+- **Use Cases**: Analytics, data science, reporting
+- **When to Choose**: Google Cloud, serverless analytics
+
+## Selection Criteria Matrix
+
+| Criterion | SQL | NewSQL | Document | Key-Value | Column-Family | Graph | Time-Series |
+|-----------|-----|--------|----------|-----------|---------------|-------|-------------|
+| ACID Guarantees | ✅ Strong | ✅ Strong | ⚠️ Eventual | ⚠️ Eventual | ⚠️ Tunable | ⚠️ Varies | ⚠️ Varies |
+| Horizontal Scaling | ❌ Limited | ✅ Native | ✅ Native | ✅ Native | ✅ Native | ⚠️ Limited | ✅ Native |
+| Query Flexibility | ✅ High | ✅ High | ⚠️ Moderate | ❌ Low | ❌ Low | ✅ High | ⚠️ Specialized |
+| Schema Flexibility | ❌ Rigid | ❌ Rigid | ✅ High | ✅ High | ⚠️ Moderate | ✅ High | ⚠️ Structured |
+| Performance (Reads) | ⚠️ Good | ⚠️ Good | ✅ Excellent | ✅ Excellent | ✅ Excellent | ⚠️ Good | ✅ Excellent |
+| Performance (Writes) | ⚠️ Good | ⚠️ Good | ✅ Excellent | ✅ Excellent | ✅ Excellent | ⚠️ Good | ✅ Excellent |
+| Operational Complexity | ✅ Low | ❌ High | ⚠️ Moderate | ✅ Low | ❌ High | ⚠️ Moderate | ⚠️ Moderate |
+| Ecosystem Maturity | ✅ Mature | ⚠️ Growing | ✅ Mature | ✅ Mature | ✅ Mature | ✅ Mature | ⚠️ Growing |
+
+## Decision Checklist
+
+### Requirements Analysis
+- [ ] **Data Volume**: Current and projected data size
+- [ ] **Transaction Volume**: Reads per second, writes per second
+- [ ] **Consistency Requirements**: Strong vs eventual consistency needs
+- [ ] **Query Patterns**: Simple lookups vs complex analytics
+- [ ] **Schema Evolution**: How often does schema change?
+- [ ] **Geographic Distribution**: Single region vs global
+- [ ] **Availability Requirements**: Acceptable downtime
+- [ ] **Team Expertise**: Existing knowledge and learning curve
+- [ ] **Budget Constraints**: Licensing, infrastructure, operational costs
+- [ ] **Compliance Requirements**: Data residency, audit trails
+
+### Technical Evaluation
+- [ ] **Performance Testing**: Benchmark with realistic data and queries
+- [ ] **Scalability Testing**: Test scaling limits and patterns
+- [ ] **Failure Scenarios**: Test backup, recovery, and failure handling
+- [ ] **Integration Testing**: APIs, connectors, ecosystem tools
+- [ ] **Migration Path**: How to migrate from current system
+- [ ] **Monitoring and Observability**: Available tooling and metrics
+
+### Operational Considerations
+- [ ] **Management Complexity**: Setup, configuration, maintenance
+- [ ] **Backup and Recovery**: Built-in vs external tools
+- [ ] **Security Features**: Authentication, authorization, encryption
+- [ ] **Upgrade Path**: Version compatibility and upgrade process
+- [ ] **Support Options**: Community vs commercial support
+- [ ] **Lock-in Risk**: Portability and vendor independence
+
+## Common Decision Patterns
+
+### E-commerce Platform
+**Typical Choice**: PostgreSQL or MySQL
+- **Primary Data**: Product catalog, orders, users (structured)
+- **Query Patterns**: OLTP with some analytics
+- **Consistency**: Strong consistency for financial data
+- **Scale**: Moderate with read replicas
+- **Additional**: Redis for caching, Elasticsearch for product search
+
+### IoT/Sensor Data Platform
+**Typical Choice**: TimescaleDB or InfluxDB
+- **Primary Data**: Time-series sensor readings
+- **Query Patterns**: Time-based aggregations, trend analysis
+- **Scale**: High write volume, moderate read volume
+- **Additional**: Kafka for ingestion, PostgreSQL for metadata
+
+### Social Media Application
+**Typical Choice**: Combination approach
+- **User Profiles**: MongoDB (flexible schema)
+- **Relationships**: Neo4j (graph relationships)
+- **Activity Feeds**: Cassandra (high write volume)
+- **Search**: Elasticsearch (content discovery)
+- **Caching**: Redis (sessions, real-time data)
+
+### Analytics Platform
+**Typical Choice**: Snowflake or BigQuery
+- **Primary Use**: Complex analytical queries
+- **Data Volume**: Large (TB to PB scale)
+- **Query Patterns**: Ad-hoc analytics, reporting
+- **Users**: Data analysts, data scientists
+- **Additional**: Data lake (S3/GCS) for raw data storage
+
+### Global SaaS Application
+**Typical Choice**: CockroachDB or DynamoDB
+- **Requirements**: Multi-region, strong consistency
+- **Scale**: Global user base
+- **Compliance**: Data residency requirements
+- **Availability**: High availability across regions
+
+## Migration Strategies
+
+### From Monolithic to Distributed
+1. **Assessment**: Identify scaling bottlenecks
+2. **Data Partitioning**: Plan how to split data
+3. **Gradual Migration**: Move non-critical data first
+4. **Dual Writes**: Run both systems temporarily
+5. **Validation**: Verify data consistency
+6. **Cutover**: Switch reads and writes gradually
+
+### Technology Stack Evolution
+1. **Start Simple**: Begin with PostgreSQL or MySQL
+2. **Identify Bottlenecks**: Monitor performance and scaling issues
+3. **Selective Scaling**: Move specific workloads to specialized databases
+4. **Polyglot Persistence**: Use multiple databases for different use cases
+5. **Service Boundaries**: Align database choice with service boundaries
+
+## Conclusion
+
+Database selection should be driven by:
+
+1. **Specific Use Case Requirements**: Not all applications need the same database
+2. **Data Characteristics**: Structure, volume, and access patterns matter
+3. **Non-functional Requirements**: Consistency, availability, performance targets
+4. **Team and Organizational Factors**: Expertise, operational capacity, budget
+5. **Evolution Path**: How requirements and scale will change over time
+
+The best database choice is often not a single technology, but a combination of databases that each excel at their specific use case within your application architecture.
--- a/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/index_strategy_patterns.md
+++ b/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/index_strategy_patterns.md
@@ -0,0 +1,424 @@
+# Index Strategy Patterns
+
+## Overview
+
+Database indexes are critical for query performance, but they come with trade-offs. This guide covers proven patterns for index design, optimization strategies, and common pitfalls to avoid.
+
+## Index Types and Use Cases
+
+### B-Tree Indexes (Default)
+
+**Best For:**
+- Equality queries (`WHERE column = value`)
+- Range queries (`WHERE column BETWEEN x AND y`)
+- Sorting (`ORDER BY column`)
+- Pattern matching with leading wildcards (`WHERE column LIKE 'prefix%'`)
+
+**Characteristics:**
+- Logarithmic lookup time O(log n)
+- Supports partial matches on composite indexes
+- Most versatile index type
+
+**Example:**
+```sql
+-- Single column B-tree index
+CREATE INDEX idx_customers_email ON customers (email);
+
+-- Composite B-tree index
+CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
+```
+
+### Hash Indexes
+
+**Best For:**
+- Exact equality matches only
+- High-cardinality columns
+- Primary key lookups
+
+**Characteristics:**
+- Constant lookup time O(1) for exact matches
+- Cannot support range queries or sorting
+- Memory-efficient for equality operations
+
+**Example:**
+```sql
+-- Hash index for exact lookups (PostgreSQL)
+CREATE INDEX idx_users_id_hash ON users USING HASH (user_id);
+```
+
+### Partial Indexes
+
+**Best For:**
+- Filtering on subset of data
+- Reducing index size and maintenance overhead
+- Query patterns that consistently use specific filters
+
+**Example:**
+```sql
+-- Index only active users
+CREATE INDEX idx_active_users_email 
+ON users (email) 
+WHERE status = 'active';
+
+-- Index recent orders only
+CREATE INDEX idx_recent_orders 
+ON orders (customer_id, created_at) 
+WHERE created_at > CURRENT_DATE - INTERVAL '90 days';
+
+-- Index non-null values only
+CREATE INDEX idx_customers_phone 
+ON customers (phone_number) 
+WHERE phone_number IS NOT NULL;
+```
+
+### Covering Indexes
+
+**Best For:**
+- Eliminating table lookups for SELECT queries
+- Frequently accessed column combinations
+- Read-heavy workloads
+
+**Example:**
+```sql
+-- Covering index with INCLUDE clause (SQL Server/PostgreSQL)
+CREATE INDEX idx_orders_customer_covering 
+ON orders (customer_id, order_date) 
+INCLUDE (order_total, status);
+
+-- Query can be satisfied entirely from index:
+-- SELECT order_total, status FROM orders 
+-- WHERE customer_id = 123 AND order_date > '2024-01-01';
+```
+
+### Functional/Expression Indexes
+
+**Best For:**
+- Queries on transformed column values
+- Case-insensitive searches
+- Complex calculations
+
+**Example:**
+```sql
+-- Case-insensitive email searches
+CREATE INDEX idx_users_email_lower 
+ON users (LOWER(email));
+
+-- Date part extraction
+CREATE INDEX idx_orders_month 
+ON orders (EXTRACT(MONTH FROM order_date));
+
+-- JSON field indexing
+CREATE INDEX idx_users_preferences_theme 
+ON users ((preferences->>'theme'));
+```
+
+## Composite Index Design Patterns
+
+### Column Ordering Strategy
+
+**Rule: Most Selective First**
+```sql
+-- Query: WHERE status = 'active' AND city = 'New York' AND age > 25
+-- Assume: status has 3 values, city has 100 values, age has 80 values
+
+-- GOOD: Most selective column first
+CREATE INDEX idx_users_city_age_status ON users (city, age, status);
+
+-- BAD: Least selective first
+CREATE INDEX idx_users_status_city_age ON users (status, city, age);
+```
+
+**Selectivity Calculation:**
+```sql
+-- Estimate selectivity for each column
+SELECT 
+    'status' as column_name,
+    COUNT(DISTINCT status)::float / COUNT(*) as selectivity
+FROM users
+UNION ALL
+SELECT 
+    'city' as column_name,
+    COUNT(DISTINCT city)::float / COUNT(*) as selectivity
+FROM users
+UNION ALL
+SELECT 
+    'age' as column_name,
+    COUNT(DISTINCT age)::float / COUNT(*) as selectivity
+FROM users;
+```
+
+### Query Pattern Matching
+
+**Pattern 1: Equality + Range**
+```sql
+-- Query: WHERE customer_id = 123 AND order_date BETWEEN '2024-01-01' AND '2024-03-31'
+CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
+```
+
+**Pattern 2: Multiple Equality Conditions**
+```sql
+-- Query: WHERE status = 'active' AND category = 'premium' AND region = 'US'
+CREATE INDEX idx_users_status_category_region ON users (status, category, region);
+```
+
+**Pattern 3: Equality + Sorting**
+```sql
+-- Query: WHERE category = 'electronics' ORDER BY price DESC, created_at DESC
+CREATE INDEX idx_products_category_price_date ON products (category, price DESC, created_at DESC);
+```
+
+### Prefix Optimization
+
+**Efficient Prefix Usage:**
+```sql
+-- Index supports all these queries efficiently:
+CREATE INDEX idx_users_lastname_firstname_email ON users (last_name, first_name, email);
+
+-- ✓ Uses index: WHERE last_name = 'Smith'
+-- ✓ Uses index: WHERE last_name = 'Smith' AND first_name = 'John'  
+-- ✓ Uses index: WHERE last_name = 'Smith' AND first_name = 'John' AND email = 'john@...'
+-- ✗ Cannot use index: WHERE first_name = 'John'
+-- ✗ Cannot use index: WHERE email = 'john@...'
+```
+
+## Performance Optimization Patterns
+
+### Index Intersection vs Composite Indexes
+
+**Scenario: Multiple single-column indexes**
+```sql
+CREATE INDEX idx_users_age ON users (age);
+CREATE INDEX idx_users_city ON users (city);
+CREATE INDEX idx_users_status ON users (status);
+
+-- Query: WHERE age > 25 AND city = 'NYC' AND status = 'active'
+-- Database may use index intersection (combining multiple indexes)
+-- Performance varies by database engine and data distribution
+```
+
+**Better: Purpose-built composite index**
+```sql
+-- More efficient for the specific query pattern
+CREATE INDEX idx_users_city_status_age ON users (city, status, age);
+```
+
+### Index Size vs Performance Trade-off
+
+**Wide Indexes (Many Columns):**
+```sql
+-- Pros: Covers many query patterns, excellent for covering queries
+-- Cons: Large index size, slower writes, more memory usage
+CREATE INDEX idx_orders_comprehensive 
+ON orders (customer_id, order_date, status, total_amount, shipping_method, created_at)
+INCLUDE (order_notes, billing_address);
+```
+
+**Narrow Indexes (Few Columns):**
+```sql
+-- Pros: Smaller size, faster writes, less memory
+-- Cons: May not cover all query patterns
+CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
+CREATE INDEX idx_orders_status ON orders (status);
+```
+
+### Maintenance Optimization
+
+**Regular Index Analysis:**
+```sql
+-- PostgreSQL: Check index usage statistics
+SELECT 
+    schemaname,
+    tablename,
+    indexname,
+    idx_scan as index_scans,
+    idx_tup_read as tuples_read,
+    idx_tup_fetch as tuples_fetched
+FROM pg_stat_user_indexes
+WHERE idx_scan = 0  -- Potentially unused indexes
+ORDER BY schemaname, tablename;
+
+-- Check index size
+SELECT 
+    indexname,
+    pg_size_pretty(pg_relation_size(indexname::regclass)) as index_size
+FROM pg_indexes
+WHERE schemaname = 'public'
+ORDER BY pg_relation_size(indexname::regclass) DESC;
+```
+
+## Common Anti-Patterns
+
+### 1. Over-Indexing
+
+**Problem:**
+```sql
+-- Too many similar indexes
+CREATE INDEX idx_orders_customer ON orders (customer_id);
+CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);  
+CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
+CREATE INDEX idx_orders_customer_date_status ON orders (customer_id, order_date, status);
+```
+
+**Solution:**
+```sql
+-- One well-designed composite index can often replace several
+CREATE INDEX idx_orders_customer_date_status ON orders (customer_id, order_date, status);
+-- Drop redundant indexes: idx_orders_customer, idx_orders_customer_date, idx_orders_customer_status
+```
+
+### 2. Wrong Column Order
+
+**Problem:**
+```sql
+-- Query: WHERE active = true AND user_type = 'premium' AND city = 'Chicago'
+-- Bad order: boolean first (lowest selectivity)
+CREATE INDEX idx_users_active_type_city ON users (active, user_type, city);
+```
+
+**Solution:**
+```sql
+-- Good order: most selective first
+CREATE INDEX idx_users_city_type_active ON users (city, user_type, active);
+```
+
+### 3. Ignoring Query Patterns
+
+**Problem:**
+```sql
+-- Index doesn't match common query patterns
+CREATE INDEX idx_products_name ON products (product_name);
+
+-- But queries are: WHERE category = 'electronics' AND price BETWEEN 100 AND 500
+-- Index is not helpful for these queries
+```
+
+**Solution:**
+```sql
+-- Match actual query patterns
+CREATE INDEX idx_products_category_price ON products (category, price);
+```
+
+### 4. Function in WHERE Without Functional Index
+
+**Problem:**
+```sql
+-- Query uses function but no functional index
+SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
+-- Regular index on email won't help
+```
+
+**Solution:**
+```sql
+-- Create functional index
+CREATE INDEX idx_users_email_lower ON users (LOWER(email));
+```
+
+## Advanced Patterns
+
+### Multi-Column Statistics
+
+**When Columns Are Correlated:**
+```sql
+-- If city and state are highly correlated, create extended statistics
+CREATE STATISTICS stats_address_correlation ON city, state FROM addresses;
+ANALYZE addresses;
+
+-- Helps query planner make better decisions for:
+-- WHERE city = 'New York' AND state = 'NY'
+```
+
+### Conditional Indexes for Data Lifecycle
+
+**Pattern: Different indexes for different data ages**
+```sql
+-- Hot data (recent orders) - optimized for OLTP
+CREATE INDEX idx_orders_hot_customer_date 
+ON orders (customer_id, order_date DESC) 
+WHERE order_date > CURRENT_DATE - INTERVAL '30 days';
+
+-- Warm data (older orders) - optimized for analytics  
+CREATE INDEX idx_orders_warm_date_total 
+ON orders (order_date, total_amount) 
+WHERE order_date <= CURRENT_DATE - INTERVAL '30 days' 
+  AND order_date > CURRENT_DATE - INTERVAL '1 year';
+
+-- Cold data (archived orders) - minimal indexing
+CREATE INDEX idx_orders_cold_date 
+ON orders (order_date) 
+WHERE order_date <= CURRENT_DATE - INTERVAL '1 year';
+```
+
+### Index-Only Scan Optimization
+
+**Design indexes to avoid table access:**
+```sql
+-- Query: SELECT order_id, total_amount, status FROM orders WHERE customer_id = ?
+CREATE INDEX idx_orders_customer_covering 
+ON orders (customer_id) 
+INCLUDE (order_id, total_amount, status);
+
+-- Or as composite index (if database doesn't support INCLUDE)
+CREATE INDEX idx_orders_customer_covering 
+ON orders (customer_id, order_id, total_amount, status);
+```
+
+## Index Monitoring and Maintenance
+
+### Performance Monitoring Queries
+
+**Find slow queries that might benefit from indexes:**
+```sql
+-- PostgreSQL: Find queries with high cost
+SELECT 
+    query,
+    calls,
+    total_time,
+    mean_time,
+    rows
+FROM pg_stat_statements
+WHERE mean_time > 1000  -- Queries taking > 1 second
+ORDER BY mean_time DESC;
+```
+
+**Identify missing indexes:**
+```sql
+-- Look for sequential scans on large tables
+SELECT 
+    schemaname,
+    tablename,
+    seq_scan,
+    seq_tup_read,
+    idx_scan,
+    n_tup_ins + n_tup_upd + n_tup_del as write_activity
+FROM pg_stat_user_tables
+WHERE seq_scan > 100 
+  AND seq_tup_read > 100000  -- Large sequential scans
+  AND (idx_scan = 0 OR seq_scan > idx_scan * 2)
+ORDER BY seq_tup_read DESC;
+```
+
+### Index Maintenance Schedule
+
+**Regular Maintenance Tasks:**
+```sql
+-- Rebuild fragmented indexes (SQL Server)
+ALTER INDEX ALL ON orders REBUILD;
+
+-- Update statistics (PostgreSQL)
+ANALYZE orders;
+
+-- Check for unused indexes monthly
+SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
+```
+
+## Conclusion
+
+Effective index strategy requires:
+
+1. **Understanding Query Patterns**: Analyze actual application queries, not theoretical scenarios
+2. **Measuring Performance**: Use query execution plans and timing to validate index effectiveness  
+3. **Balancing Trade-offs**: More indexes improve reads but slow writes and increase storage
+4. **Regular Maintenance**: Monitor index usage and performance, remove unused indexes
+5. **Iterative Improvement**: Start with essential indexes, add and optimize based on real usage
+
+The goal is not to index every possible query pattern, but to create a focused set of indexes that provide maximum benefit for your application's specific workload while minimizing maintenance overhead.
--- a/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/normalization_guide.md
+++ b/.brain/.agent/skills/engineering-advanced-skills/database-designer/references/normalization_guide.md
@@ -0,0 +1,354 @@
+# Database Normalization Guide
+
+## Overview
+
+Database normalization is the process of organizing data to minimize redundancy and dependency issues. It involves decomposing tables to eliminate data anomalies and improve data integrity.
+
+## Normal Forms
+
+### First Normal Form (1NF)
+
+**Requirements:**
+- Each column contains atomic (indivisible) values
+- Each column contains values of the same type
+- Each column has a unique name
+- The order of data storage doesn't matter
+
+**Violations and Solutions:**
+
+**Problem: Multiple values in single column**
+```sql
+-- BAD: Multiple phone numbers in one column
+CREATE TABLE customers (
+    id INT PRIMARY KEY,
+    name VARCHAR(100),
+    phones VARCHAR(500)  -- "555-1234, 555-5678, 555-9012"
+);
+
+-- GOOD: Separate table for multiple phones
+CREATE TABLE customers (
+    id INT PRIMARY KEY,
+    name VARCHAR(100)
+);
+
+CREATE TABLE customer_phones (
+    id INT PRIMARY KEY,
+    customer_id INT REFERENCES customers(id),
+    phone VARCHAR(20),
+    phone_type VARCHAR(10) -- 'mobile', 'home', 'work'
+);
+```
+
+**Problem: Repeating groups**
+```sql
+-- BAD: Repeating column patterns
+CREATE TABLE orders (
+    order_id INT PRIMARY KEY,
+    customer_id INT,
+    item1_name VARCHAR(100),
+    item1_qty INT,
+    item1_price DECIMAL(8,2),
+    item2_name VARCHAR(100),
+    item2_qty INT,
+    item2_price DECIMAL(8,2),
+    item3_name VARCHAR(100),
+    item3_qty INT,
+    item3_price DECIMAL(8,2)
+);
+
+-- GOOD: Separate table for order items
+CREATE TABLE orders (
+    order_id INT PRIMARY KEY,
+    customer_id INT,
+    order_date DATE
+);
+
+CREATE TABLE order_items (
+    id INT PRIMARY KEY,
+    order_id INT REFERENCES orders(order_id),
+    item_name VARCHAR(100),
+    quantity INT,
+    unit_price DECIMAL(8,2)
+);
+```
+
+### Second Normal Form (2NF)
+
+**Requirements:**
+- Must be in 1NF
+- All non-key attributes must be fully functionally dependent on the primary key
+- No partial dependencies (applies only to tables with composite primary keys)
+
+**Violations and Solutions:**
+
+**Problem: Partial dependency on composite key**
+```sql
+-- BAD: Student course enrollment with partial dependencies
+CREATE TABLE student_courses (
+    student_id INT,
+    course_id INT,
+    student_name VARCHAR(100),    -- Depends only on student_id
+    student_major VARCHAR(50),    -- Depends only on student_id
+    course_title VARCHAR(200),    -- Depends only on course_id
+    course_credits INT,           -- Depends only on course_id
+    grade CHAR(2),               -- Depends on both student_id AND course_id
+    PRIMARY KEY (student_id, course_id)
+);
+
+-- GOOD: Separate tables eliminate partial dependencies
+CREATE TABLE students (
+    student_id INT PRIMARY KEY,
+    student_name VARCHAR(100),
+    student_major VARCHAR(50)
+);
+
+CREATE TABLE courses (
+    course_id INT PRIMARY KEY,
+    course_title VARCHAR(200),
+    course_credits INT
+);
+
+CREATE TABLE enrollments (
+    student_id INT,
+    course_id INT,
+    grade CHAR(2),
+    enrollment_date DATE,
+    PRIMARY KEY (student_id, course_id),
+    FOREIGN KEY (student_id) REFERENCES students(student_id),
+    FOREIGN KEY (course_id) REFERENCES courses(course_id)
+);
+```
+
+### Third Normal Form (3NF)
+
+**Requirements:**
+- Must be in 2NF
+- No transitive dependencies (non-key attributes should not depend on other non-key attributes)
+- All non-key attributes must depend directly on the primary key
+
+**Violations and Solutions:**
+
+**Problem: Transitive dependency**
+```sql
+-- BAD: Employee table with transitive dependency
+CREATE TABLE employees (
+    employee_id INT PRIMARY KEY,
+    employee_name VARCHAR(100),
+    department_id INT,
+    department_name VARCHAR(100),     -- Depends on department_id, not employee_id
+    department_location VARCHAR(100), -- Transitive dependency through department_id
+    department_budget DECIMAL(10,2),  -- Transitive dependency through department_id
+    salary DECIMAL(8,2)
+);
+
+-- GOOD: Separate department information
+CREATE TABLE departments (
+    department_id INT PRIMARY KEY,
+    department_name VARCHAR(100),
+    department_location VARCHAR(100),
+    department_budget DECIMAL(10,2)
+);
+
+CREATE TABLE employees (
+    employee_id INT PRIMARY KEY,
+    employee_name VARCHAR(100),
+    department_id INT,
+    salary DECIMAL(8,2),
+    FOREIGN KEY (department_id) REFERENCES departments(department_id)
+);
+```
+
+### Boyce-Codd Normal Form (BCNF)
+
+**Requirements:**
+- Must be in 3NF
+- Every determinant must be a candidate key
+- Stricter than 3NF - handles cases where 3NF doesn't eliminate all anomalies
+
+**Violations and Solutions:**
+
+**Problem: Determinant that's not a candidate key**
+```sql
+-- BAD: Student advisor relationship with BCNF violation
+-- Assumption: Each student has one advisor per subject, 
+-- each advisor teaches only one subject, but can advise multiple students
+CREATE TABLE student_advisor (
+    student_id INT,
+    subject VARCHAR(50),
+    advisor_id INT,
+    PRIMARY KEY (student_id, subject)
+);
+-- Problem: advisor_id determines subject, but advisor_id is not a candidate key
+
+-- GOOD: Separate the functional dependencies
+CREATE TABLE advisors (
+    advisor_id INT PRIMARY KEY,
+    subject VARCHAR(50)
+);
+
+CREATE TABLE student_advisor_assignments (
+    student_id INT,
+    advisor_id INT,
+    PRIMARY KEY (student_id, advisor_id),
+    FOREIGN KEY (advisor_id) REFERENCES advisors(advisor_id)
+);
+```
+
+## Denormalization Strategies
+
+### When to Denormalize
+
+1. **Performance Requirements**: When query performance is more critical than storage efficiency
+2. **Read-Heavy Workloads**: When data is read much more frequently than it's updated
+3. **Reporting Systems**: When complex joins negatively impact reporting performance
+4. **Caching Strategies**: When pre-computed values eliminate expensive calculations
+
+### Common Denormalization Patterns
+
+**1. Redundant Storage for Performance**
+```sql
+-- Store frequently accessed calculated values
+CREATE TABLE orders (
+    order_id INT PRIMARY KEY,
+    customer_id INT,
+    order_total DECIMAL(10,2),     -- Denormalized: sum of order_items.total
+    item_count INT,                -- Denormalized: count of order_items
+    created_at TIMESTAMP
+);
+
+CREATE TABLE order_items (
+    item_id INT PRIMARY KEY,
+    order_id INT,
+    product_id INT,
+    quantity INT,
+    unit_price DECIMAL(8,2),
+    total DECIMAL(10,2)            -- quantity * unit_price (denormalized)
+);
+```
+
+**2. Materialized Aggregates**
+```sql
+-- Pre-computed summary tables for reporting
+CREATE TABLE monthly_sales_summary (
+    year_month VARCHAR(7),         -- '2024-03'
+    product_category VARCHAR(50),
+    total_sales DECIMAL(12,2),
+    total_units INT,
+    avg_order_value DECIMAL(8,2),
+    unique_customers INT,
+    updated_at TIMESTAMP
+);
+```
+
+**3. Historical Data Snapshots**
+```sql
+-- Store historical state to avoid complex temporal queries
+CREATE TABLE customer_status_history (
+    id INT PRIMARY KEY,
+    customer_id INT,
+    status VARCHAR(20),
+    tier VARCHAR(10),
+    total_lifetime_value DECIMAL(12,2), -- Snapshot at this point in time
+    snapshot_date DATE
+);
+```
+
+## Trade-offs Analysis
+
+### Normalization Benefits
+- **Data Integrity**: Reduced risk of inconsistent data
+- **Storage Efficiency**: Less data duplication
+- **Update Efficiency**: Changes need to be made in only one place
+- **Flexibility**: Easier to modify schema as requirements change
+
+### Normalization Costs
+- **Query Complexity**: More joins required for data retrieval
+- **Performance Impact**: Joins can be expensive on large datasets
+- **Development Complexity**: More complex data access patterns
+
+### Denormalization Benefits
+- **Query Performance**: Fewer joins, faster queries
+- **Simplified Queries**: Direct access to related data
+- **Read Optimization**: Optimized for data retrieval patterns
+- **Reduced Load**: Less database processing for common operations
+
+### Denormalization Costs
+- **Data Redundancy**: Increased storage requirements
+- **Update Complexity**: Multiple places may need updates
+- **Consistency Risk**: Higher risk of data inconsistencies
+- **Maintenance Overhead**: Additional code to maintain derived values
+
+## Best Practices
+
+### 1. Start with Full Normalization
+- Begin with a fully normalized design
+- Identify performance bottlenecks through testing
+- Selectively denormalize based on actual performance needs
+
+### 2. Use Triggers for Consistency
+```sql
+-- Trigger to maintain denormalized order_total
+CREATE TRIGGER update_order_total
+AFTER INSERT OR UPDATE OR DELETE ON order_items
+FOR EACH ROW
+BEGIN
+    UPDATE orders 
+    SET order_total = (
+        SELECT SUM(quantity * unit_price) 
+        FROM order_items 
+        WHERE order_id = NEW.order_id
+    )
+    WHERE order_id = NEW.order_id;
+END;
+```
+
+### 3. Consider Materialized Views
+```sql
+-- Materialized view for complex aggregations
+CREATE MATERIALIZED VIEW customer_summary AS
+SELECT 
+    c.customer_id,
+    c.customer_name,
+    COUNT(o.order_id) as order_count,
+    SUM(o.order_total) as lifetime_value,
+    AVG(o.order_total) as avg_order_value,
+    MAX(o.created_at) as last_order_date
+FROM customers c
+LEFT JOIN orders o ON c.customer_id = o.customer_id
+GROUP BY c.customer_id, c.customer_name;
+```
+
+### 4. Document Denormalization Decisions
+- Clearly document why denormalization was chosen
+- Specify which data is derived and how it's maintained
+- Include performance benchmarks that justify the decision
+
+### 5. Monitor and Validate
+- Implement validation checks for denormalized data
+- Regular audits to ensure data consistency
+- Performance monitoring to validate denormalization benefits
+
+## Common Anti-Patterns
+
+### 1. Premature Denormalization
+Starting with denormalized design without understanding actual performance requirements.
+
+### 2. Over-Normalization
+Creating too many small tables that require excessive joins for simple queries.
+
+### 3. Inconsistent Approach
+Mixing normalized and denormalized patterns without clear strategy.
+
+### 4. Ignoring Maintenance
+Denormalizing without proper mechanisms to maintain data consistency.
+
+## Conclusion
+
+Normalization and denormalization are both valuable tools in database design. The key is understanding when to apply each approach:
+
+- **Use normalization** for transactional systems where data integrity is paramount
+- **Consider denormalization** for analytical systems or when performance testing reveals bottlenecks
+- **Apply selectively** based on actual usage patterns and performance requirements
+- **Maintain consistency** through proper design patterns and validation mechanisms
+
+The goal is not to achieve perfect normalization or denormalization, but to create a design that best serves your application's specific needs while maintaining data quality and system performance.