# Database Designer - POWERFUL Tier Skill A comprehensive database design and analysis toolkit that provides expert-level schema analysis, index optimization, and migration generation capabilities for modern database systems. ## Features ### 🔍 Schema Analyzer - **Normalization Analysis**: Automated detection of 1NF through BCNF violations - **Data Type Optimization**: Identifies antipatterns and inappropriate types - **Constraint Analysis**: Finds missing foreign keys, unique constraints, and checks - **ERD Generation**: Creates Mermaid diagrams from DDL or JSON schema - **Naming Convention Validation**: Ensures consistent naming patterns ### ⚡ Index Optimizer - **Missing Index Detection**: Identifies indexes needed for query patterns - **Composite Index Design**: Optimizes column ordering for maximum efficiency - **Redundancy Analysis**: Finds duplicate and overlapping indexes - **Performance Modeling**: Estimates selectivity and query performance impact - **Covering Index Recommendations**: Eliminates table lookups ### 🚀 Migration Generator - **Zero-Downtime Migrations**: Implements expand-contract patterns - **Schema Evolution**: Handles column changes, table renames, constraint updates - **Data Migration Scripts**: Automated data transformation and validation - **Rollback Planning**: Complete reversal capabilities for all changes - **Execution Orchestration**: Dependency-aware migration ordering ## Quick Start ### Prerequisites - Python 3.7+ (no external dependencies required) - Database schema in SQL DDL format or JSON - Query patterns (for index optimization) ### Installation ```bash # Clone or download the database-designer skill cd engineering/database-designer/ # Make scripts executable chmod +x *.py ``` ## Usage Examples ### Schema Analysis **Analyze SQL DDL file:** ```bash python schema_analyzer.py --input assets/sample_schema.sql --output-format text ``` **Generate ERD diagram:** ```bash python schema_analyzer.py --input assets/sample_schema.sql --generate-erd --output analysis.txt ``` **JSON schema analysis:** ```bash python schema_analyzer.py --input assets/sample_schema.json --output-format json --output results.json ``` ### Index Optimization **Basic index analysis:** ```bash python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json ``` **High-priority recommendations only:** ```bash python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --min-priority 2 ``` **JSON output with existing index analysis:** ```bash python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --format json --analyze-existing ``` ### Migration Generation **Generate migration between schemas:** ```bash python migration_generator.py --current assets/current_schema.json --target assets/target_schema.json ``` **Zero-downtime migration:** ```bash python migration_generator.py --current current.json --target target.json --zero-downtime --format sql ``` **Include validation queries:** ```bash python migration_generator.py --current current.json --target target.json --include-validations --output migration_plan.txt ``` ## Tool Documentation ### Schema Analyzer **Input Formats:** - SQL DDL files (.sql) - JSON schema definitions (.json) **Key Capabilities:** - Detects 1NF violations (non-atomic values, repeating groups) - Identifies 2NF issues (partial dependencies in composite keys) - Finds 3NF problems (transitive dependencies) - Checks BCNF compliance (determinant key requirements) - Validates data types (VARCHAR(255) antipattern, inappropriate types) - Missing constraints (NOT NULL, UNIQUE, CHECK, foreign keys) - Naming convention adherence **Sample Command:** ```bash python schema_analyzer.py \ --input sample_schema.sql \ --generate-erd \ --output-format text \ --output analysis.txt ``` **Output:** - Comprehensive text or JSON analysis report - Mermaid ERD diagram - Prioritized recommendations - SQL statements for improvements ### Index Optimizer **Input Requirements:** - Schema definition (JSON format) - Query patterns with frequency and selectivity data **Analysis Features:** - Selectivity estimation based on column patterns - Composite index column ordering optimization - Covering index recommendations for SELECT queries - Foreign key index validation - Redundancy detection (duplicates, overlaps, unused indexes) - Performance impact modeling **Sample Command:** ```bash python index_optimizer.py \ --schema schema.json \ --queries query_patterns.json \ --format text \ --min-priority 3 \ --output recommendations.txt ``` **Output:** - Prioritized index recommendations - CREATE INDEX statements - Drop statements for redundant indexes - Performance impact analysis - Storage size estimates ### Migration Generator **Input Requirements:** - Current schema (JSON format) - Target schema (JSON format) **Migration Strategies:** - Standard migrations with ALTER statements - Zero-downtime expand-contract patterns - Data migration and transformation scripts - Constraint management (add/drop in correct order) - Index management with timing estimates **Sample Command:** ```bash python migration_generator.py \ --current current_schema.json \ --target target_schema.json \ --zero-downtime \ --include-validations \ --format text ``` **Output:** - Step-by-step migration plan - Forward and rollback SQL statements - Risk assessment for each step - Validation queries - Execution time estimates ## File Structure ``` database-designer/ ├── README.md # This file ├── SKILL.md # Comprehensive database design guide ├── schema_analyzer.py # Schema analysis tool ├── index_optimizer.py # Index optimization tool ├── migration_generator.py # Migration generation tool ├── references/ # Reference documentation │ ├── normalization_guide.md # Normalization principles and patterns │ ├── index_strategy_patterns.md # Index design and optimization guide │ └── database_selection_decision_tree.md # Database technology selection ├── assets/ # Sample files and test data │ ├── sample_schema.sql # Sample DDL with various issues │ ├── sample_schema.json # JSON schema definition │ └── sample_query_patterns.json # Query patterns for index analysis └── expected_outputs/ # Example tool outputs ├── schema_analysis_sample.txt # Sample schema analysis report ├── index_optimization_sample.txt # Sample index recommendations └── migration_sample.txt # Sample migration plan ``` ## JSON Schema Format The tools use a standardized JSON format for schema definitions: ```json { "tables": { "table_name": { "columns": { "column_name": { "type": "VARCHAR(255)", "nullable": true, "unique": false, "foreign_key": "other_table.column", "default": "default_value", "cardinality_estimate": 1000 } }, "primary_key": ["id"], "unique_constraints": [["email"], ["username"]], "check_constraints": { "chk_positive_price": "price > 0" }, "indexes": [ { "name": "idx_table_column", "columns": ["column_name"], "unique": false, "partial_condition": "status = 'active'" } ] } } } ``` ## Query Patterns Format For index optimization, provide query patterns in this format: ```json { "queries": [ { "id": "user_lookup", "type": "SELECT", "table": "users", "where_conditions": [ { "column": "email", "operator": "=", "selectivity": 0.95 } ], "join_conditions": [ { "local_column": "user_id", "foreign_table": "orders", "foreign_column": "id", "join_type": "INNER" } ], "order_by": [ {"column": "created_at", "direction": "DESC"} ], "frequency": 1000, "avg_execution_time_ms": 5.2 } ] } ``` ## Best Practices ### Schema Analysis 1. **Start with DDL**: Use actual CREATE TABLE statements when possible 2. **Include Constraints**: Capture all existing constraints and indexes 3. **Consider History**: Some denormalization may be intentional for performance 4. **Validate Results**: Review recommendations against business requirements ### Index Optimization 1. **Real Query Patterns**: Use actual application queries, not theoretical ones 2. **Include Frequency**: Query frequency is crucial for prioritization 3. **Monitor Performance**: Validate recommendations with actual performance testing 4. **Gradual Implementation**: Add indexes incrementally and monitor impact ### Migration Planning 1. **Test Migrations**: Always test on non-production environments first 2. **Backup First**: Ensure complete backups before running migrations 3. **Monitor Progress**: Watch for locks and performance impacts during execution 4. **Rollback Ready**: Have rollback procedures tested and ready ## Advanced Usage ### Custom Selectivity Estimation The index optimizer uses pattern-based selectivity estimation. You can improve accuracy by providing cardinality estimates in your schema JSON: ```json { "columns": { "status": { "type": "VARCHAR(20)", "cardinality_estimate": 5 # Only 5 distinct values } } } ``` ### Zero-Downtime Migration Strategy For production systems, use the zero-downtime flag to generate expand-contract migrations: 1. **Expand Phase**: Add new columns/tables without constraints 2. **Dual Write**: Application writes to both old and new structures 3. **Backfill**: Populate new structures with existing data 4. **Contract Phase**: Remove old structures after validation ### Integration with CI/CD Integrate these tools into your deployment pipeline: ```bash # Schema validation in CI python schema_analyzer.py --input schema.sql --output-format json | \ jq '.constraint_analysis.total_issues' | \ test $(cat) -eq 0 || exit 1 # Generate migrations automatically python migration_generator.py \ --current prod_schema.json \ --target new_schema.json \ --zero-downtime \ --output migration.sql ``` ## Troubleshooting ### Common Issues **"No tables found in input file"** - Ensure SQL DDL uses standard CREATE TABLE syntax - Check for syntax errors in DDL - Verify file encoding (UTF-8 recommended) **"Invalid JSON schema"** - Validate JSON syntax with a JSON validator - Ensure all required fields are present - Check that foreign key references use "table.column" format **"Analysis shows no issues but problems exist"** - Tools use heuristic analysis - review recommendations carefully - Some design decisions may be intentional (denormalization for performance) - Consider domain-specific requirements not captured by general rules ### Performance Tips **Large Schemas:** - Use `--output-format json` for machine processing - Consider analyzing subsets of tables for very large schemas - Provide cardinality estimates for better index recommendations **Complex Queries:** - Include actual execution times in query patterns - Provide realistic frequency estimates - Consider seasonal or usage pattern variations ## Contributing This is a self-contained skill with no external dependencies. To extend functionality: 1. Follow the existing code patterns 2. Maintain Python standard library only requirement 3. Add comprehensive test cases for new features 4. Update documentation and examples ## License This database designer skill is part of the claude-skills collection and follows the same licensing terms.