add brain
This commit is contained in:
@@ -0,0 +1,388 @@
|
||||
# Database Designer - POWERFUL Tier Skill
|
||||
|
||||
A comprehensive database design and analysis toolkit that provides expert-level schema analysis, index optimization, and migration generation capabilities for modern database systems.
|
||||
|
||||
## Features
|
||||
|
||||
### 🔍 Schema Analyzer
|
||||
- **Normalization Analysis**: Automated detection of 1NF through BCNF violations
|
||||
- **Data Type Optimization**: Identifies antipatterns and inappropriate types
|
||||
- **Constraint Analysis**: Finds missing foreign keys, unique constraints, and checks
|
||||
- **ERD Generation**: Creates Mermaid diagrams from DDL or JSON schema
|
||||
- **Naming Convention Validation**: Ensures consistent naming patterns
|
||||
|
||||
### ⚡ Index Optimizer
|
||||
- **Missing Index Detection**: Identifies indexes needed for query patterns
|
||||
- **Composite Index Design**: Optimizes column ordering for maximum efficiency
|
||||
- **Redundancy Analysis**: Finds duplicate and overlapping indexes
|
||||
- **Performance Modeling**: Estimates selectivity and query performance impact
|
||||
- **Covering Index Recommendations**: Eliminates table lookups
|
||||
|
||||
### 🚀 Migration Generator
|
||||
- **Zero-Downtime Migrations**: Implements expand-contract patterns
|
||||
- **Schema Evolution**: Handles column changes, table renames, constraint updates
|
||||
- **Data Migration Scripts**: Automated data transformation and validation
|
||||
- **Rollback Planning**: Complete reversal capabilities for all changes
|
||||
- **Execution Orchestration**: Dependency-aware migration ordering
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.7+ (no external dependencies required)
|
||||
- Database schema in SQL DDL format or JSON
|
||||
- Query patterns (for index optimization)
|
||||
|
||||
### Installation
|
||||
```bash
|
||||
# Clone or download the database-designer skill
|
||||
cd engineering/database-designer/
|
||||
|
||||
# Make scripts executable
|
||||
chmod +x *.py
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Schema Analysis
|
||||
|
||||
**Analyze SQL DDL file:**
|
||||
```bash
|
||||
python schema_analyzer.py --input assets/sample_schema.sql --output-format text
|
||||
```
|
||||
|
||||
**Generate ERD diagram:**
|
||||
```bash
|
||||
python schema_analyzer.py --input assets/sample_schema.sql --generate-erd --output analysis.txt
|
||||
```
|
||||
|
||||
**JSON schema analysis:**
|
||||
```bash
|
||||
python schema_analyzer.py --input assets/sample_schema.json --output-format json --output results.json
|
||||
```
|
||||
|
||||
### Index Optimization
|
||||
|
||||
**Basic index analysis:**
|
||||
```bash
|
||||
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json
|
||||
```
|
||||
|
||||
**High-priority recommendations only:**
|
||||
```bash
|
||||
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --min-priority 2
|
||||
```
|
||||
|
||||
**JSON output with existing index analysis:**
|
||||
```bash
|
||||
python index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --format json --analyze-existing
|
||||
```
|
||||
|
||||
### Migration Generation
|
||||
|
||||
**Generate migration between schemas:**
|
||||
```bash
|
||||
python migration_generator.py --current assets/current_schema.json --target assets/target_schema.json
|
||||
```
|
||||
|
||||
**Zero-downtime migration:**
|
||||
```bash
|
||||
python migration_generator.py --current current.json --target target.json --zero-downtime --format sql
|
||||
```
|
||||
|
||||
**Include validation queries:**
|
||||
```bash
|
||||
python migration_generator.py --current current.json --target target.json --include-validations --output migration_plan.txt
|
||||
```
|
||||
|
||||
## Tool Documentation
|
||||
|
||||
### Schema Analyzer
|
||||
|
||||
**Input Formats:**
|
||||
- SQL DDL files (.sql)
|
||||
- JSON schema definitions (.json)
|
||||
|
||||
**Key Capabilities:**
|
||||
- Detects 1NF violations (non-atomic values, repeating groups)
|
||||
- Identifies 2NF issues (partial dependencies in composite keys)
|
||||
- Finds 3NF problems (transitive dependencies)
|
||||
- Checks BCNF compliance (determinant key requirements)
|
||||
- Validates data types (VARCHAR(255) antipattern, inappropriate types)
|
||||
- Missing constraints (NOT NULL, UNIQUE, CHECK, foreign keys)
|
||||
- Naming convention adherence
|
||||
|
||||
**Sample Command:**
|
||||
```bash
|
||||
python schema_analyzer.py \
|
||||
--input sample_schema.sql \
|
||||
--generate-erd \
|
||||
--output-format text \
|
||||
--output analysis.txt
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Comprehensive text or JSON analysis report
|
||||
- Mermaid ERD diagram
|
||||
- Prioritized recommendations
|
||||
- SQL statements for improvements
|
||||
|
||||
### Index Optimizer
|
||||
|
||||
**Input Requirements:**
|
||||
- Schema definition (JSON format)
|
||||
- Query patterns with frequency and selectivity data
|
||||
|
||||
**Analysis Features:**
|
||||
- Selectivity estimation based on column patterns
|
||||
- Composite index column ordering optimization
|
||||
- Covering index recommendations for SELECT queries
|
||||
- Foreign key index validation
|
||||
- Redundancy detection (duplicates, overlaps, unused indexes)
|
||||
- Performance impact modeling
|
||||
|
||||
**Sample Command:**
|
||||
```bash
|
||||
python index_optimizer.py \
|
||||
--schema schema.json \
|
||||
--queries query_patterns.json \
|
||||
--format text \
|
||||
--min-priority 3 \
|
||||
--output recommendations.txt
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Prioritized index recommendations
|
||||
- CREATE INDEX statements
|
||||
- Drop statements for redundant indexes
|
||||
- Performance impact analysis
|
||||
- Storage size estimates
|
||||
|
||||
### Migration Generator
|
||||
|
||||
**Input Requirements:**
|
||||
- Current schema (JSON format)
|
||||
- Target schema (JSON format)
|
||||
|
||||
**Migration Strategies:**
|
||||
- Standard migrations with ALTER statements
|
||||
- Zero-downtime expand-contract patterns
|
||||
- Data migration and transformation scripts
|
||||
- Constraint management (add/drop in correct order)
|
||||
- Index management with timing estimates
|
||||
|
||||
**Sample Command:**
|
||||
```bash
|
||||
python migration_generator.py \
|
||||
--current current_schema.json \
|
||||
--target target_schema.json \
|
||||
--zero-downtime \
|
||||
--include-validations \
|
||||
--format text
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Step-by-step migration plan
|
||||
- Forward and rollback SQL statements
|
||||
- Risk assessment for each step
|
||||
- Validation queries
|
||||
- Execution time estimates
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
database-designer/
|
||||
├── README.md # This file
|
||||
├── SKILL.md # Comprehensive database design guide
|
||||
├── schema_analyzer.py # Schema analysis tool
|
||||
├── index_optimizer.py # Index optimization tool
|
||||
├── migration_generator.py # Migration generation tool
|
||||
├── references/ # Reference documentation
|
||||
│ ├── normalization_guide.md # Normalization principles and patterns
|
||||
│ ├── index_strategy_patterns.md # Index design and optimization guide
|
||||
│ └── database_selection_decision_tree.md # Database technology selection
|
||||
├── assets/ # Sample files and test data
|
||||
│ ├── sample_schema.sql # Sample DDL with various issues
|
||||
│ ├── sample_schema.json # JSON schema definition
|
||||
│ └── sample_query_patterns.json # Query patterns for index analysis
|
||||
└── expected_outputs/ # Example tool outputs
|
||||
├── schema_analysis_sample.txt # Sample schema analysis report
|
||||
├── index_optimization_sample.txt # Sample index recommendations
|
||||
└── migration_sample.txt # Sample migration plan
|
||||
```
|
||||
|
||||
## JSON Schema Format
|
||||
|
||||
The tools use a standardized JSON format for schema definitions:
|
||||
|
||||
```json
|
||||
{
|
||||
"tables": {
|
||||
"table_name": {
|
||||
"columns": {
|
||||
"column_name": {
|
||||
"type": "VARCHAR(255)",
|
||||
"nullable": true,
|
||||
"unique": false,
|
||||
"foreign_key": "other_table.column",
|
||||
"default": "default_value",
|
||||
"cardinality_estimate": 1000
|
||||
}
|
||||
},
|
||||
"primary_key": ["id"],
|
||||
"unique_constraints": [["email"], ["username"]],
|
||||
"check_constraints": {
|
||||
"chk_positive_price": "price > 0"
|
||||
},
|
||||
"indexes": [
|
||||
{
|
||||
"name": "idx_table_column",
|
||||
"columns": ["column_name"],
|
||||
"unique": false,
|
||||
"partial_condition": "status = 'active'"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Query Patterns Format
|
||||
|
||||
For index optimization, provide query patterns in this format:
|
||||
|
||||
```json
|
||||
{
|
||||
"queries": [
|
||||
{
|
||||
"id": "user_lookup",
|
||||
"type": "SELECT",
|
||||
"table": "users",
|
||||
"where_conditions": [
|
||||
{
|
||||
"column": "email",
|
||||
"operator": "=",
|
||||
"selectivity": 0.95
|
||||
}
|
||||
],
|
||||
"join_conditions": [
|
||||
{
|
||||
"local_column": "user_id",
|
||||
"foreign_table": "orders",
|
||||
"foreign_column": "id",
|
||||
"join_type": "INNER"
|
||||
}
|
||||
],
|
||||
"order_by": [
|
||||
{"column": "created_at", "direction": "DESC"}
|
||||
],
|
||||
"frequency": 1000,
|
||||
"avg_execution_time_ms": 5.2
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Schema Analysis
|
||||
1. **Start with DDL**: Use actual CREATE TABLE statements when possible
|
||||
2. **Include Constraints**: Capture all existing constraints and indexes
|
||||
3. **Consider History**: Some denormalization may be intentional for performance
|
||||
4. **Validate Results**: Review recommendations against business requirements
|
||||
|
||||
### Index Optimization
|
||||
1. **Real Query Patterns**: Use actual application queries, not theoretical ones
|
||||
2. **Include Frequency**: Query frequency is crucial for prioritization
|
||||
3. **Monitor Performance**: Validate recommendations with actual performance testing
|
||||
4. **Gradual Implementation**: Add indexes incrementally and monitor impact
|
||||
|
||||
### Migration Planning
|
||||
1. **Test Migrations**: Always test on non-production environments first
|
||||
2. **Backup First**: Ensure complete backups before running migrations
|
||||
3. **Monitor Progress**: Watch for locks and performance impacts during execution
|
||||
4. **Rollback Ready**: Have rollback procedures tested and ready
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Selectivity Estimation
|
||||
The index optimizer uses pattern-based selectivity estimation. You can improve accuracy by providing cardinality estimates in your schema JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"columns": {
|
||||
"status": {
|
||||
"type": "VARCHAR(20)",
|
||||
"cardinality_estimate": 5 # Only 5 distinct values
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Zero-Downtime Migration Strategy
|
||||
For production systems, use the zero-downtime flag to generate expand-contract migrations:
|
||||
|
||||
1. **Expand Phase**: Add new columns/tables without constraints
|
||||
2. **Dual Write**: Application writes to both old and new structures
|
||||
3. **Backfill**: Populate new structures with existing data
|
||||
4. **Contract Phase**: Remove old structures after validation
|
||||
|
||||
### Integration with CI/CD
|
||||
Integrate these tools into your deployment pipeline:
|
||||
|
||||
```bash
|
||||
# Schema validation in CI
|
||||
python schema_analyzer.py --input schema.sql --output-format json | \
|
||||
jq '.constraint_analysis.total_issues' | \
|
||||
test $(cat) -eq 0 || exit 1
|
||||
|
||||
# Generate migrations automatically
|
||||
python migration_generator.py \
|
||||
--current prod_schema.json \
|
||||
--target new_schema.json \
|
||||
--zero-downtime \
|
||||
--output migration.sql
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**"No tables found in input file"**
|
||||
- Ensure SQL DDL uses standard CREATE TABLE syntax
|
||||
- Check for syntax errors in DDL
|
||||
- Verify file encoding (UTF-8 recommended)
|
||||
|
||||
**"Invalid JSON schema"**
|
||||
- Validate JSON syntax with a JSON validator
|
||||
- Ensure all required fields are present
|
||||
- Check that foreign key references use "table.column" format
|
||||
|
||||
**"Analysis shows no issues but problems exist"**
|
||||
- Tools use heuristic analysis - review recommendations carefully
|
||||
- Some design decisions may be intentional (denormalization for performance)
|
||||
- Consider domain-specific requirements not captured by general rules
|
||||
|
||||
### Performance Tips
|
||||
|
||||
**Large Schemas:**
|
||||
- Use `--output-format json` for machine processing
|
||||
- Consider analyzing subsets of tables for very large schemas
|
||||
- Provide cardinality estimates for better index recommendations
|
||||
|
||||
**Complex Queries:**
|
||||
- Include actual execution times in query patterns
|
||||
- Provide realistic frequency estimates
|
||||
- Consider seasonal or usage pattern variations
|
||||
|
||||
## Contributing
|
||||
|
||||
This is a self-contained skill with no external dependencies. To extend functionality:
|
||||
|
||||
1. Follow the existing code patterns
|
||||
2. Maintain Python standard library only requirement
|
||||
3. Add comprehensive test cases for new features
|
||||
4. Update documentation and examples
|
||||
|
||||
## License
|
||||
|
||||
This database designer skill is part of the claude-skills collection and follows the same licensing terms.
|
||||
Reference in New Issue
Block a user