CleanArchitecture-template/.brain/.agent/skills/engineering-advanced-skills/agent-designer/SKILL.md

---
name: "agent-designer"
description: "Agent Designer - Multi-Agent System Architecture"
---

# Agent Designer - Multi-Agent System Architecture

**Tier:** POWERFUL
**Category:** Engineering
**Tags:** AI agents, architecture, system design, orchestration, multi-agent systems

## Overview

Agent Designer is a comprehensive toolkit for designing, architecting, and evaluating multi-agent systems. It provides structured approaches to agent architecture patterns, tool design principles, communication strategies, and performance evaluation frameworks for building robust, scalable AI agent systems.

## Core Capabilities

### 1. Agent Architecture Patterns

#### Single Agent Pattern
- **Use Case:** Simple, focused tasks with clear boundaries
- **Pros:** Minimal complexity, easy debugging, predictable behavior
- **Cons:** Limited scalability, single point of failure
- **Implementation:** Direct user-agent interaction with comprehensive tool access

#### Supervisor Pattern
- **Use Case:** Hierarchical task decomposition with centralized control
- **Architecture:** One supervisor agent coordinating multiple specialist agents
- **Pros:** Clear command structure, centralized decision making
- **Cons:** Supervisor bottleneck, complex coordination logic
- **Implementation:** Supervisor receives tasks, delegates to specialists, aggregates results

#### Swarm Pattern
- **Use Case:** Distributed problem solving with peer-to-peer collaboration
- **Architecture:** Multiple autonomous agents with shared objectives
- **Pros:** High parallelism, fault tolerance, emergent intelligence
- **Cons:** Complex coordination, potential conflicts, harder to predict
- **Implementation:** Agent discovery, consensus mechanisms, distributed task allocation

#### Hierarchical Pattern
- **Use Case:** Complex systems with multiple organizational layers
- **Architecture:** Tree structure with managers and workers at different levels
- **Pros:** Natural organizational mapping, clear responsibilities
- **Cons:** Communication overhead, potential bottlenecks at each level
- **Implementation:** Multi-level delegation with feedback loops

#### Pipeline Pattern
- **Use Case:** Sequential processing with specialized stages
- **Architecture:** Agents arranged in processing pipeline
- **Pros:** Clear data flow, specialized optimization per stage
- **Cons:** Sequential bottlenecks, rigid processing order
- **Implementation:** Message queues between stages, state handoffs

### 2. Agent Role Definition

#### Role Specification Framework
- **Identity:** Name, purpose statement, core competencies
- **Responsibilities:** Primary tasks, decision boundaries, success criteria
- **Capabilities:** Required tools, knowledge domains, processing limits
- **Interfaces:** Input/output formats, communication protocols
- **Constraints:** Security boundaries, resource limits, operational guidelines

#### Common Agent Archetypes

**Coordinator Agent**
- Orchestrates multi-agent workflows
- Makes high-level decisions and resource allocation
- Monitors system health and performance
- Handles escalations and conflict resolution

**Specialist Agent**
- Deep expertise in specific domain (code, data, research)
- Optimized tools and knowledge for specialized tasks
- High-quality output within narrow scope
- Clear handoff protocols for out-of-scope requests

**Interface Agent**
- Handles external interactions (users, APIs, systems)
- Protocol translation and format conversion
- Authentication and authorization management
- User experience optimization

**Monitor Agent**
- System health monitoring and alerting
- Performance metrics collection and analysis
- Anomaly detection and reporting
- Compliance and audit trail maintenance

### 3. Tool Design Principles

#### Schema Design
- **Input Validation:** Strong typing, required vs optional parameters
- **Output Consistency:** Standardized response formats, error handling
- **Documentation:** Clear descriptions, usage examples, edge cases
- **Versioning:** Backward compatibility, migration paths

#### Error Handling Patterns
- **Graceful Degradation:** Partial functionality when dependencies fail
- **Retry Logic:** Exponential backoff, circuit breakers, max attempts
- **Error Propagation:** Structured error responses, error classification
- **Recovery Strategies:** Fallback methods, alternative approaches

#### Idempotency Requirements
- **Safe Operations:** Read operations with no side effects
- **Idempotent Writes:** Same operation can be safely repeated
- **State Management:** Version tracking, conflict resolution
- **Atomicity:** All-or-nothing operation completion

### 4. Communication Patterns

#### Message Passing
- **Asynchronous Messaging:** Decoupled agents, message queues
- **Message Format:** Structured payloads with metadata
- **Delivery Guarantees:** At-least-once, exactly-once semantics
- **Routing:** Direct messaging, publish-subscribe, broadcast

#### Shared State
- **State Stores:** Centralized data repositories
- **Consistency Models:** Strong, eventual, weak consistency
- **Access Patterns:** Read-heavy, write-heavy, mixed workloads
- **Conflict Resolution:** Last-writer-wins, merge strategies

#### Event-Driven Architecture
- **Event Sourcing:** Immutable event logs, state reconstruction
- **Event Types:** Domain events, system events, integration events
- **Event Processing:** Real-time, batch, stream processing
- **Event Schema:** Versioned event formats, backward compatibility

### 5. Guardrails and Safety

#### Input Validation
- **Schema Enforcement:** Required fields, type checking, format validation
- **Content Filtering:** Harmful content detection, PII scrubbing
- **Rate Limiting:** Request throttling, resource quotas
- **Authentication:** Identity verification, authorization checks

#### Output Filtering
- **Content Moderation:** Harmful content removal, quality checks
- **Consistency Validation:** Logic checks, constraint verification
- **Formatting:** Standardized output formats, clean presentation
- **Audit Logging:** Decision trails, compliance records

#### Human-in-the-Loop
- **Approval Workflows:** Critical decision checkpoints
- **Escalation Triggers:** Confidence thresholds, risk assessment
- **Override Mechanisms:** Human judgment precedence
- **Feedback Loops:** Human corrections improve system behavior

### 6. Evaluation Frameworks

#### Task Completion Metrics
- **Success Rate:** Percentage of tasks completed successfully
- **Partial Completion:** Progress measurement for complex tasks
- **Task Classification:** Success criteria by task type
- **Failure Analysis:** Root cause identification and categorization

#### Quality Assessment
- **Output Quality:** Accuracy, relevance, completeness measures
- **Consistency:** Response variability across similar inputs
- **Coherence:** Logical flow and internal consistency
- **User Satisfaction:** Feedback scores, usage patterns

#### Cost Analysis
- **Token Usage:** Input/output token consumption per task
- **API Costs:** External service usage and charges
- **Compute Resources:** CPU, memory, storage utilization
- **Time-to-Value:** Cost per successful task completion

#### Latency Distribution
- **Response Time:** End-to-end task completion time
- **Processing Stages:** Bottleneck identification per stage
- **Queue Times:** Wait times in processing pipelines
- **Resource Contention:** Impact of concurrent operations

### 7. Orchestration Strategies

#### Centralized Orchestration
- **Workflow Engine:** Central coordinator manages all agents
- **State Management:** Centralized workflow state tracking
- **Decision Logic:** Complex routing and branching rules
- **Monitoring:** Comprehensive visibility into all operations

#### Decentralized Orchestration
- **Peer-to-Peer:** Agents coordinate directly with each other
- **Service Discovery:** Dynamic agent registration and lookup
- **Consensus Protocols:** Distributed decision making
- **Fault Tolerance:** No single point of failure

#### Hybrid Approaches
- **Domain Boundaries:** Centralized within domains, federated across
- **Hierarchical Coordination:** Multiple orchestration levels
- **Context-Dependent:** Strategy selection based on task type
- **Load Balancing:** Distribute coordination responsibility

### 8. Memory Patterns

#### Short-Term Memory
- **Context Windows:** Working memory for current tasks
- **Session State:** Temporary data for ongoing interactions
- **Cache Management:** Performance optimization strategies
- **Memory Pressure:** Handling capacity constraints

#### Long-Term Memory
- **Persistent Storage:** Durable data across sessions
- **Knowledge Base:** Accumulated domain knowledge
- **Experience Replay:** Learning from past interactions
- **Memory Consolidation:** Transferring from short to long-term

#### Shared Memory
- **Collaborative Knowledge:** Shared learning across agents
- **Synchronization:** Consistency maintenance strategies
- **Access Control:** Permission-based memory access
- **Memory Partitioning:** Isolation between agent groups

### 9. Scaling Considerations

#### Horizontal Scaling
- **Agent Replication:** Multiple instances of same agent type
- **Load Distribution:** Request routing across agent instances
- **Resource Pooling:** Shared compute and storage resources
- **Geographic Distribution:** Multi-region deployments

#### Vertical Scaling
- **Capability Enhancement:** More powerful individual agents
- **Tool Expansion:** Broader tool access per agent
- **Context Expansion:** Larger working memory capacity
- **Processing Power:** Higher throughput per agent

#### Performance Optimization
- **Caching Strategies:** Response caching, tool result caching
- **Parallel Processing:** Concurrent task execution
- **Resource Optimization:** Efficient resource utilization
- **Bottleneck Elimination:** Systematic performance tuning

### 10. Failure Handling

#### Retry Mechanisms
- **Exponential Backoff:** Increasing delays between retries
- **Jitter:** Random delay variation to prevent thundering herd
- **Maximum Attempts:** Bounded retry behavior
- **Retry Conditions:** Transient vs permanent failure classification

#### Fallback Strategies
- **Graceful Degradation:** Reduced functionality when systems fail
- **Alternative Approaches:** Different methods for same goals
- **Default Responses:** Safe fallback behaviors
- **User Communication:** Clear failure messaging

#### Circuit Breakers
- **Failure Detection:** Monitoring failure rates and response times
- **State Management:** Open, closed, half-open circuit states
- **Recovery Testing:** Gradual return to normal operation
- **Cascading Failure Prevention:** Protecting upstream systems

## Implementation Guidelines

### Architecture Decision Process
1. **Requirements Analysis:** Understand system goals, constraints, scale
2. **Pattern Selection:** Choose appropriate architecture pattern
3. **Agent Design:** Define roles, responsibilities, interfaces
4. **Tool Architecture:** Design tool schemas and error handling
5. **Communication Design:** Select message patterns and protocols
6. **Safety Implementation:** Build guardrails and validation
7. **Evaluation Planning:** Define success metrics and monitoring
8. **Deployment Strategy:** Plan scaling and failure handling

### Quality Assurance
- **Testing Strategy:** Unit, integration, and system testing approaches
- **Monitoring:** Real-time system health and performance tracking
- **Documentation:** Architecture documentation and runbooks
- **Security Review:** Threat modeling and security assessments

### Continuous Improvement
- **Performance Monitoring:** Ongoing system performance analysis
- **User Feedback:** Incorporating user experience improvements
- **A/B Testing:** Controlled experiments for system improvements
- **Knowledge Base Updates:** Continuous learning and adaptation

This skill provides the foundation for designing robust, scalable multi-agent systems that can handle complex tasks while maintaining safety, reliability, and performance at scale.