Files
2026-03-12 15:17:52 +07:00

388 lines
12 KiB
Markdown

# Database Designer - POWERFUL Tier Skill
A comprehensive database design and analysis toolkit that provides expert-level schema analysis, index optimization, and migration generation capabilities for modern database systems.
## Features
### 🔍 Schema Analyzer
- **Normalization Analysis**: Automated detection of 1NF through BCNF violations
- **Data Type Optimization**: Identifies antipatterns and inappropriate types
- **Constraint Analysis**: Finds missing foreign keys, unique constraints, and checks
- **ERD Generation**: Creates Mermaid diagrams from DDL or JSON schema
- **Naming Convention Validation**: Ensures consistent naming patterns
### ⚡ Index Optimizer
- **Missing Index Detection**: Identifies indexes needed for query patterns
- **Composite Index Design**: Optimizes column ordering for maximum efficiency
- **Redundancy Analysis**: Finds duplicate and overlapping indexes
- **Performance Modeling**: Estimates selectivity and query performance impact
- **Covering Index Recommendations**: Eliminates table lookups
### 🚀 Migration Generator
- **Zero-Downtime Migrations**: Implements expand-contract patterns
- **Schema Evolution**: Handles column changes, table renames, constraint updates
- **Data Migration Scripts**: Automated data transformation and validation
- **Rollback Planning**: Complete reversal capabilities for all changes
- **Execution Orchestration**: Dependency-aware migration ordering
## Quick Start
### Prerequisites
- Python 3.7+ (no external dependencies required)
- Database schema in SQL DDL format or JSON
- Query patterns (for index optimization)
### Installation
```bash
# Clone or download the database-designer skill
cd engineering/database-designer/
# Make scripts executable
chmod +x *.py
```
## Usage Examples
### Schema Analysis
**Analyze SQL DDL file:**
```bash
python schema_analyzer.py --input assets/sample_schema.sql --output-format text
```
**Generate ERD diagram:**
```bash
python schema_analyzer.py --input assets/sample_schema.sql --generate-erd --output analysis.txt
```
**JSON schema analysis:**
```bash
python schema_analyzer.py --input assets/sample_schema.json --output-format json --output results.json
```
### Index Optimization
**Basic index analysis:**
```bash
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json
```
**High-priority recommendations only:**
```bash
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --min-priority 2
```
**JSON output with existing index analysis:**
```bash
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --format json --analyze-existing
```
### Migration Generation
**Generate migration between schemas:**
```bash
python migration_generator.py --current assets/current_schema.json --target assets/target_schema.json
```
**Zero-downtime migration:**
```bash
python migration_generator.py --current current.json --target target.json --zero-downtime --format sql
```
**Include validation queries:**
```bash
python migration_generator.py --current current.json --target target.json --include-validations --output migration_plan.txt
```
## Tool Documentation
### Schema Analyzer
**Input Formats:**
- SQL DDL files (.sql)
- JSON schema definitions (.json)
**Key Capabilities:**
- Detects 1NF violations (non-atomic values, repeating groups)
- Identifies 2NF issues (partial dependencies in composite keys)
- Finds 3NF problems (transitive dependencies)
- Checks BCNF compliance (determinant key requirements)
- Validates data types (VARCHAR(255) antipattern, inappropriate types)
- Missing constraints (NOT NULL, UNIQUE, CHECK, foreign keys)
- Naming convention adherence
**Sample Command:**
```bash
python schema_analyzer.py \
--input sample_schema.sql \
--generate-erd \
--output-format text \
--output analysis.txt
```
**Output:**
- Comprehensive text or JSON analysis report
- Mermaid ERD diagram
- Prioritized recommendations
- SQL statements for improvements
### Index Optimizer
**Input Requirements:**
- Schema definition (JSON format)
- Query patterns with frequency and selectivity data
**Analysis Features:**
- Selectivity estimation based on column patterns
- Composite index column ordering optimization
- Covering index recommendations for SELECT queries
- Foreign key index validation
- Redundancy detection (duplicates, overlaps, unused indexes)
- Performance impact modeling
**Sample Command:**
```bash
python index_optimizer.py \
--schema schema.json \
--queries query_patterns.json \
--format text \
--min-priority 3 \
--output recommendations.txt
```
**Output:**
- Prioritized index recommendations
- CREATE INDEX statements
- Drop statements for redundant indexes
- Performance impact analysis
- Storage size estimates
### Migration Generator
**Input Requirements:**
- Current schema (JSON format)
- Target schema (JSON format)
**Migration Strategies:**
- Standard migrations with ALTER statements
- Zero-downtime expand-contract patterns
- Data migration and transformation scripts
- Constraint management (add/drop in correct order)
- Index management with timing estimates
**Sample Command:**
```bash
python migration_generator.py \
--current current_schema.json \
--target target_schema.json \
--zero-downtime \
--include-validations \
--format text
```
**Output:**
- Step-by-step migration plan
- Forward and rollback SQL statements
- Risk assessment for each step
- Validation queries
- Execution time estimates
## File Structure
```
database-designer/
├── README.md # This file
├── SKILL.md # Comprehensive database design guide
├── schema_analyzer.py # Schema analysis tool
├── index_optimizer.py # Index optimization tool
├── migration_generator.py # Migration generation tool
├── references/ # Reference documentation
│ ├── normalization_guide.md # Normalization principles and patterns
│ ├── index_strategy_patterns.md # Index design and optimization guide
│ └── database_selection_decision_tree.md # Database technology selection
├── assets/ # Sample files and test data
│ ├── sample_schema.sql # Sample DDL with various issues
│ ├── sample_schema.json # JSON schema definition
│ └── sample_query_patterns.json # Query patterns for index analysis
└── expected_outputs/ # Example tool outputs
├── schema_analysis_sample.txt # Sample schema analysis report
├── index_optimization_sample.txt # Sample index recommendations
└── migration_sample.txt # Sample migration plan
```
## JSON Schema Format
The tools use a standardized JSON format for schema definitions:
```json
{
"tables": {
"table_name": {
"columns": {
"column_name": {
"type": "VARCHAR(255)",
"nullable": true,
"unique": false,
"foreign_key": "other_table.column",
"default": "default_value",
"cardinality_estimate": 1000
}
},
"primary_key": ["id"],
"unique_constraints": [["email"], ["username"]],
"check_constraints": {
"chk_positive_price": "price > 0"
},
"indexes": [
{
"name": "idx_table_column",
"columns": ["column_name"],
"unique": false,
"partial_condition": "status = 'active'"
}
]
}
}
}
```
## Query Patterns Format
For index optimization, provide query patterns in this format:
```json
{
"queries": [
{
"id": "user_lookup",
"type": "SELECT",
"table": "users",
"where_conditions": [
{
"column": "email",
"operator": "=",
"selectivity": 0.95
}
],
"join_conditions": [
{
"local_column": "user_id",
"foreign_table": "orders",
"foreign_column": "id",
"join_type": "INNER"
}
],
"order_by": [
{"column": "created_at", "direction": "DESC"}
],
"frequency": 1000,
"avg_execution_time_ms": 5.2
}
]
}
```
## Best Practices
### Schema Analysis
1. **Start with DDL**: Use actual CREATE TABLE statements when possible
2. **Include Constraints**: Capture all existing constraints and indexes
3. **Consider History**: Some denormalization may be intentional for performance
4. **Validate Results**: Review recommendations against business requirements
### Index Optimization
1. **Real Query Patterns**: Use actual application queries, not theoretical ones
2. **Include Frequency**: Query frequency is crucial for prioritization
3. **Monitor Performance**: Validate recommendations with actual performance testing
4. **Gradual Implementation**: Add indexes incrementally and monitor impact
### Migration Planning
1. **Test Migrations**: Always test on non-production environments first
2. **Backup First**: Ensure complete backups before running migrations
3. **Monitor Progress**: Watch for locks and performance impacts during execution
4. **Rollback Ready**: Have rollback procedures tested and ready
## Advanced Usage
### Custom Selectivity Estimation
The index optimizer uses pattern-based selectivity estimation. You can improve accuracy by providing cardinality estimates in your schema JSON:
```json
{
"columns": {
"status": {
"type": "VARCHAR(20)",
"cardinality_estimate": 5 # Only 5 distinct values
}
}
}
```
### Zero-Downtime Migration Strategy
For production systems, use the zero-downtime flag to generate expand-contract migrations:
1. **Expand Phase**: Add new columns/tables without constraints
2. **Dual Write**: Application writes to both old and new structures
3. **Backfill**: Populate new structures with existing data
4. **Contract Phase**: Remove old structures after validation
### Integration with CI/CD
Integrate these tools into your deployment pipeline:
```bash
# Schema validation in CI
python schema_analyzer.py --input schema.sql --output-format json | \
jq '.constraint_analysis.total_issues' | \
test $(cat) -eq 0 || exit 1
# Generate migrations automatically
python migration_generator.py \
--current prod_schema.json \
--target new_schema.json \
--zero-downtime \
--output migration.sql
```
## Troubleshooting
### Common Issues
**"No tables found in input file"**
- Ensure SQL DDL uses standard CREATE TABLE syntax
- Check for syntax errors in DDL
- Verify file encoding (UTF-8 recommended)
**"Invalid JSON schema"**
- Validate JSON syntax with a JSON validator
- Ensure all required fields are present
- Check that foreign key references use "table.column" format
**"Analysis shows no issues but problems exist"**
- Tools use heuristic analysis - review recommendations carefully
- Some design decisions may be intentional (denormalization for performance)
- Consider domain-specific requirements not captured by general rules
### Performance Tips
**Large Schemas:**
- Use `--output-format json` for machine processing
- Consider analyzing subsets of tables for very large schemas
- Provide cardinality estimates for better index recommendations
**Complex Queries:**
- Include actual execution times in query patterns
- Provide realistic frequency estimates
- Consider seasonal or usage pattern variations
## Contributing
This is a self-contained skill with no external dependencies. To extend functionality:
1. Follow the existing code patterns
2. Maintain Python standard library only requirement
3. Add comprehensive test cases for new features
4. Update documentation and examples
## License
This database designer skill is part of the claude-skills collection and follows the same licensing terms.