add brain
This commit is contained in:
@@ -0,0 +1,155 @@
|
||||
---
|
||||
name: "performance-profiler"
|
||||
description: "Performance Profiler"
|
||||
---
|
||||
|
||||
# Performance Profiler
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Performance Engineering
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **CPU profiling** — flamegraphs for Node.js, py-spy for Python, pprof for Go
|
||||
- **Memory profiling** — heap snapshots, leak detection, GC pressure
|
||||
- **Bundle analysis** — webpack-bundle-analyzer, Next.js bundle analyzer
|
||||
- **Database optimization** — EXPLAIN ANALYZE, slow query log, N+1 detection
|
||||
- **Load testing** — k6 scripts, Artillery scenarios, ramp-up patterns
|
||||
- **Before/after measurement** — establish baseline, profile, optimize, verify
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- App is slow and you don't know where the bottleneck is
|
||||
- P99 latency exceeds SLA before a release
|
||||
- Memory usage grows over time (suspected leak)
|
||||
- Bundle size increased after adding dependencies
|
||||
- Preparing for a traffic spike (load test before launch)
|
||||
- Database queries taking >100ms
|
||||
|
||||
---
|
||||
|
||||
## Golden Rule: Measure First
|
||||
|
||||
```bash
|
||||
# Establish baseline BEFORE any optimization
|
||||
# Record: P50, P95, P99 latency | RPS | error rate | memory usage
|
||||
|
||||
# Wrong: "I think the N+1 query is slow, let me fix it"
|
||||
# Right: Profile → confirm bottleneck → fix → measure again → verify improvement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Node.js Profiling
|
||||
→ See references/profiling-recipes.md for details
|
||||
|
||||
## Before/After Measurement Template
|
||||
|
||||
```markdown
|
||||
## Performance Optimization: [What You Fixed]
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Engineer:** @username
|
||||
**Ticket:** PROJ-123
|
||||
|
||||
### Problem
|
||||
[1-2 sentences: what was slow, how was it observed]
|
||||
|
||||
### Root Cause
|
||||
[What the profiler revealed]
|
||||
|
||||
### Baseline (Before)
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| P50 latency | 480ms |
|
||||
| P95 latency | 1,240ms |
|
||||
| P99 latency | 3,100ms |
|
||||
| RPS @ 50 VUs | 42 |
|
||||
| Error rate | 0.8% |
|
||||
| DB queries/req | 23 (N+1) |
|
||||
|
||||
Profiler evidence: [link to flamegraph or screenshot]
|
||||
|
||||
### Fix Applied
|
||||
[What changed — code diff or description]
|
||||
|
||||
### After
|
||||
| Metric | Before | After | Delta |
|
||||
|--------|--------|-------|-------|
|
||||
| P50 latency | 480ms | 48ms | -90% |
|
||||
| P95 latency | 1,240ms | 120ms | -90% |
|
||||
| P99 latency | 3,100ms | 280ms | -91% |
|
||||
| RPS @ 50 VUs | 42 | 380 | +804% |
|
||||
| Error rate | 0.8% | 0% | -100% |
|
||||
| DB queries/req | 23 | 1 | -96% |
|
||||
|
||||
### Verification
|
||||
Load test run: [link to k6 output]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimization Checklist
|
||||
|
||||
### Quick wins (check these first)
|
||||
|
||||
```
|
||||
Database
|
||||
□ Missing indexes on WHERE/ORDER BY columns
|
||||
□ N+1 queries (check query count per request)
|
||||
□ Loading all columns when only 2-3 needed (SELECT *)
|
||||
□ No LIMIT on unbounded queries
|
||||
□ Missing connection pool (creating new connection per request)
|
||||
|
||||
Node.js
|
||||
□ Sync I/O (fs.readFileSync) in hot path
|
||||
□ JSON.parse/stringify of large objects in hot loop
|
||||
□ Missing caching for expensive computations
|
||||
□ No compression (gzip/brotli) on responses
|
||||
□ Dependencies loaded in request handler (move to module level)
|
||||
|
||||
Bundle
|
||||
□ Moment.js → dayjs/date-fns
|
||||
□ Lodash (full) → lodash/function imports
|
||||
□ Static imports of heavy components → dynamic imports
|
||||
□ Images not optimized / not using next/image
|
||||
□ No code splitting on routes
|
||||
|
||||
API
|
||||
□ No pagination on list endpoints
|
||||
□ No response caching (Cache-Control headers)
|
||||
□ Serial awaits that could be parallel (Promise.all)
|
||||
□ Fetching related data in a loop instead of JOIN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Optimizing without measuring** — you'll optimize the wrong thing
|
||||
- **Testing in development** — profile against production-like data volumes
|
||||
- **Ignoring P99** — P50 can look fine while P99 is catastrophic
|
||||
- **Premature optimization** — fix correctness first, then performance
|
||||
- **Not re-measuring** — always verify the fix actually improved things
|
||||
- **Load testing production** — use staging with production-size data
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Baseline first, always** — record metrics before touching anything
|
||||
2. **One change at a time** — isolate the variable to confirm causation
|
||||
3. **Profile with realistic data** — 10 rows in dev, millions in prod — different bottlenecks
|
||||
4. **Set performance budgets** — `p(95) < 200ms` in CI thresholds with k6
|
||||
5. **Monitor continuously** — add Datadog/Prometheus metrics for key paths
|
||||
6. **Cache invalidation strategy** — cache aggressively, invalidate precisely
|
||||
7. **Document the win** — before/after in the PR description motivates the team
|
||||
@@ -0,0 +1,475 @@
|
||||
# performance-profiler reference
|
||||
|
||||
## Node.js Profiling
|
||||
|
||||
### CPU Flamegraph
|
||||
|
||||
```bash
|
||||
# Method 1: clinic.js (best for development)
|
||||
npm install -g clinic
|
||||
|
||||
# CPU flamegraph
|
||||
clinic flame -- node dist/server.js
|
||||
|
||||
# Heap profiler
|
||||
clinic heapprofiler -- node dist/server.js
|
||||
|
||||
# Bubble chart (event loop blocking)
|
||||
clinic bubbles -- node dist/server.js
|
||||
|
||||
# Load with autocannon while profiling
|
||||
autocannon -c 50 -d 30 http://localhost:3000/api/tasks &
|
||||
clinic flame -- node dist/server.js
|
||||
```
|
||||
|
||||
```bash
|
||||
# Method 2: Node.js built-in profiler
|
||||
node --prof dist/server.js
|
||||
# After running some load:
|
||||
node --prof-process isolate-*.log | head -100
|
||||
```
|
||||
|
||||
```bash
|
||||
# Method 3: V8 CPU profiler via inspector
|
||||
node --inspect dist/server.js
|
||||
# Open Chrome DevTools → Performance → Record
|
||||
```
|
||||
|
||||
### Heap Snapshot / Memory Leak Detection
|
||||
|
||||
```javascript
|
||||
// Add to your server for on-demand heap snapshots
|
||||
import v8 from 'v8'
|
||||
import fs from 'fs'
|
||||
|
||||
// Endpoint: POST /debug/heap-snapshot (protect with auth!)
|
||||
app.post('/debug/heap-snapshot', (req, res) => {
|
||||
const filename = `heap-${Date.now()}.heapsnapshot`
|
||||
const snapshot = v8.writeHeapSnapshot(filename)
|
||||
res.json({ snapshot })
|
||||
})
|
||||
```
|
||||
|
||||
```bash
|
||||
# Take snapshots over time and compare in Chrome DevTools
|
||||
curl -X POST http://localhost:3000/debug/heap-snapshot
|
||||
# Wait 5 minutes of load
|
||||
curl -X POST http://localhost:3000/debug/heap-snapshot
|
||||
# Open both snapshots in Chrome → Memory → Compare
|
||||
```
|
||||
|
||||
### Detect Event Loop Blocking
|
||||
|
||||
```javascript
|
||||
// Add blocked-at to detect synchronous blocking
|
||||
import blocked from 'blocked-at'
|
||||
|
||||
blocked((time, stack) => {
|
||||
console.warn(`Event loop blocked for ${time}ms`)
|
||||
console.warn(stack.join('\n'))
|
||||
}, { threshold: 100 }) // Alert if blocked > 100ms
|
||||
```
|
||||
|
||||
### Node.js Memory Profiling Script
|
||||
|
||||
```javascript
|
||||
// scripts/memory-profile.mjs
|
||||
// Run: node --experimental-vm-modules scripts/memory-profile.mjs
|
||||
|
||||
import { createRequire } from 'module'
|
||||
const require = createRequire(import.meta.url)
|
||||
|
||||
function formatBytes(bytes) {
|
||||
return (bytes / 1024 / 1024).toFixed(2) + ' MB'
|
||||
}
|
||||
|
||||
function measureMemory(label) {
|
||||
const mem = process.memoryUsage()
|
||||
console.log(`\n[${label}]`)
|
||||
console.log(` RSS: ${formatBytes(mem.rss)}`)
|
||||
console.log(` Heap Used: ${formatBytes(mem.heapUsed)}`)
|
||||
console.log(` Heap Total:${formatBytes(mem.heapTotal)}`)
|
||||
console.log(` External: ${formatBytes(mem.external)}`)
|
||||
return mem
|
||||
}
|
||||
|
||||
const baseline = measureMemory('Baseline')
|
||||
|
||||
// Simulate your operation
|
||||
for (let i = 0; i < 1000; i++) {
|
||||
// Replace with your actual operation
|
||||
const result = await someOperation()
|
||||
}
|
||||
|
||||
const after = measureMemory('After 1000 operations')
|
||||
|
||||
console.log(`\n[Delta]`)
|
||||
console.log(` Heap Used: +${formatBytes(after.heapUsed - baseline.heapUsed)}`)
|
||||
|
||||
// If heap keeps growing across GC cycles, you have a leak
|
||||
global.gc?.() // Run with --expose-gc flag
|
||||
const afterGC = measureMemory('After GC')
|
||||
if (afterGC.heapUsed > baseline.heapUsed * 1.1) {
|
||||
console.warn('⚠️ Possible memory leak detected (>10% growth after GC)')
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Python Profiling
|
||||
|
||||
### CPU Profiling with py-spy
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pip install py-spy
|
||||
|
||||
# Profile a running process (no code changes needed)
|
||||
py-spy top --pid $(pgrep -f "uvicorn")
|
||||
|
||||
# Generate flamegraph SVG
|
||||
py-spy record -o flamegraph.svg --pid $(pgrep -f "uvicorn") --duration 30
|
||||
|
||||
# Profile from the start
|
||||
py-spy record -o flamegraph.svg -- python -m uvicorn app.main:app
|
||||
|
||||
# Open flamegraph.svg in browser — look for wide bars = hot code paths
|
||||
```
|
||||
|
||||
### cProfile for function-level profiling
|
||||
|
||||
```python
|
||||
# scripts/profile_endpoint.py
|
||||
import cProfile
|
||||
import pstats
|
||||
import io
|
||||
from app.services.task_service import TaskService
|
||||
|
||||
def run():
|
||||
service = TaskService()
|
||||
for _ in range(100):
|
||||
service.list_tasks(user_id="user_1", page=1, limit=20)
|
||||
|
||||
profiler = cProfile.Profile()
|
||||
profiler.enable()
|
||||
run()
|
||||
profiler.disable()
|
||||
|
||||
# Print top 20 functions by cumulative time
|
||||
stream = io.StringIO()
|
||||
stats = pstats.Stats(profiler, stream=stream)
|
||||
stats.sort_stats('cumulative')
|
||||
stats.print_stats(20)
|
||||
print(stream.getvalue())
|
||||
```
|
||||
|
||||
### Memory profiling with memory_profiler
|
||||
|
||||
```python
|
||||
# pip install memory-profiler
|
||||
from memory_profiler import profile
|
||||
|
||||
@profile
|
||||
def my_function():
|
||||
# Function to profile
|
||||
data = load_large_dataset()
|
||||
result = process(data)
|
||||
return result
|
||||
```
|
||||
|
||||
```bash
|
||||
# Run with line-by-line memory tracking
|
||||
python -m memory_profiler scripts/profile_function.py
|
||||
|
||||
# Output:
|
||||
# Line # Mem usage Increment Line Contents
|
||||
# ================================================
|
||||
# 10 45.3 MiB 45.3 MiB def my_function():
|
||||
# 11 78.1 MiB 32.8 MiB data = load_large_dataset()
|
||||
# 12 156.2 MiB 78.1 MiB result = process(data)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Go Profiling with pprof
|
||||
|
||||
```go
|
||||
// main.go — add pprof endpoints
|
||||
import _ "net/http/pprof"
|
||||
import "net/http"
|
||||
|
||||
func main() {
|
||||
// pprof endpoints at /debug/pprof/
|
||||
go func() {
|
||||
log.Println(http.ListenAndServe(":6060", nil))
|
||||
}()
|
||||
// ... rest of your app
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# CPU profile (30s)
|
||||
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
|
||||
|
||||
# Memory profile
|
||||
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
|
||||
|
||||
# Goroutine leak detection
|
||||
curl http://localhost:6060/debug/pprof/goroutine?debug=1
|
||||
|
||||
# In pprof UI: "Flame Graph" view → find the tallest bars
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bundle Size Analysis
|
||||
|
||||
### Next.js Bundle Analyzer
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pnpm add -D @next/bundle-analyzer
|
||||
|
||||
# next.config.js
|
||||
const withBundleAnalyzer = require('@next/bundle-analyzer')({
|
||||
enabled: process.env.ANALYZE === 'true',
|
||||
})
|
||||
module.exports = withBundleAnalyzer({})
|
||||
|
||||
# Run analyzer
|
||||
ANALYZE=true pnpm build
|
||||
# Opens browser with treemap of bundle
|
||||
```
|
||||
|
||||
### What to look for
|
||||
|
||||
```bash
|
||||
# Find the largest chunks
|
||||
pnpm build 2>&1 | grep -E "^\s+(λ|○|●)" | sort -k4 -rh | head -20
|
||||
|
||||
# Check if a specific package is too large
|
||||
# Visit: https://bundlephobia.com/package/moment@2.29.4
|
||||
# moment: 67.9kB gzipped → replace with date-fns (13.8kB) or dayjs (6.9kB)
|
||||
|
||||
# Find duplicate packages
|
||||
pnpm dedupe --check
|
||||
|
||||
# Visualize what's in a chunk
|
||||
npx source-map-explorer .next/static/chunks/*.js
|
||||
```
|
||||
|
||||
### Common bundle wins
|
||||
|
||||
```typescript
|
||||
// Before: import entire lodash
|
||||
import _ from 'lodash' // 71kB
|
||||
|
||||
// After: import only what you need
|
||||
import debounce from 'lodash/debounce' // 2kB
|
||||
|
||||
// Before: moment.js
|
||||
import moment from 'moment' // 67kB
|
||||
|
||||
// After: dayjs
|
||||
import dayjs from 'dayjs' // 7kB
|
||||
|
||||
// Before: static import (always in bundle)
|
||||
import HeavyChart from '@/components/HeavyChart'
|
||||
|
||||
// After: dynamic import (loaded on demand)
|
||||
const HeavyChart = dynamic(() => import('@/components/HeavyChart'), {
|
||||
loading: () => <Skeleton />,
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Query Optimization
|
||||
|
||||
### Find slow queries
|
||||
|
||||
```sql
|
||||
-- PostgreSQL: enable pg_stat_statements
|
||||
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
|
||||
|
||||
-- Top 20 slowest queries
|
||||
SELECT
|
||||
round(mean_exec_time::numeric, 2) AS mean_ms,
|
||||
calls,
|
||||
round(total_exec_time::numeric, 2) AS total_ms,
|
||||
round(stddev_exec_time::numeric, 2) AS stddev_ms,
|
||||
left(query, 80) AS query
|
||||
FROM pg_stat_statements
|
||||
WHERE calls > 10
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Reset stats
|
||||
SELECT pg_stat_statements_reset();
|
||||
```
|
||||
|
||||
```bash
|
||||
# MySQL slow query log
|
||||
mysql -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 0.1;"
|
||||
tail -f /var/log/mysql/slow-query.log
|
||||
```
|
||||
|
||||
### EXPLAIN ANALYZE
|
||||
|
||||
```sql
|
||||
-- Always use EXPLAIN (ANALYZE, BUFFERS) for real timing
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
|
||||
SELECT t.*, u.name as assignee_name
|
||||
FROM tasks t
|
||||
LEFT JOIN users u ON u.id = t.assignee_id
|
||||
WHERE t.project_id = 'proj_123'
|
||||
AND t.deleted_at IS NULL
|
||||
ORDER BY t.created_at DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Look for:
|
||||
-- Seq Scan on large table → needs index
|
||||
-- Nested Loop with high rows → N+1, consider JOIN or batch
|
||||
-- Sort → can index handle the sort?
|
||||
-- Hash Join → fine for moderate sizes
|
||||
```
|
||||
|
||||
### Detect N+1 Queries
|
||||
|
||||
```typescript
|
||||
// Add query logging in dev
|
||||
import { db } from './client'
|
||||
|
||||
// Drizzle: enable logging
|
||||
const db = drizzle(pool, { logger: true })
|
||||
|
||||
// Or use a query counter middleware
|
||||
let queryCount = 0
|
||||
db.$on('query', () => queryCount++)
|
||||
|
||||
// In tests:
|
||||
queryCount = 0
|
||||
const tasks = await getTasksWithAssignees(projectId)
|
||||
expect(queryCount).toBe(1) // Fail if it's 21 (1 + 20 N+1s)
|
||||
```
|
||||
|
||||
```python
|
||||
# Django: detect N+1 with django-silk or nplusone
|
||||
from nplusone.ext.django.middleware import NPlusOneMiddleware
|
||||
MIDDLEWARE = ['nplusone.ext.django.middleware.NPlusOneMiddleware']
|
||||
NPLUSONE_RAISE = True # Raise exception on N+1 in tests
|
||||
```
|
||||
|
||||
### Fix N+1 — Before/After
|
||||
|
||||
```typescript
|
||||
// Before: N+1 (1 query for tasks + N queries for assignees)
|
||||
const tasks = await db.select().from(tasksTable)
|
||||
for (const task of tasks) {
|
||||
task.assignee = await db.select().from(usersTable)
|
||||
.where(eq(usersTable.id, task.assigneeId))
|
||||
.then(r => r[0])
|
||||
}
|
||||
|
||||
// After: 1 query with JOIN
|
||||
const tasks = await db
|
||||
.select({
|
||||
id: tasksTable.id,
|
||||
title: tasksTable.title,
|
||||
assigneeName: usersTable.name,
|
||||
assigneeEmail: usersTable.email,
|
||||
})
|
||||
.from(tasksTable)
|
||||
.leftJoin(usersTable, eq(usersTable.id, tasksTable.assigneeId))
|
||||
.where(eq(tasksTable.projectId, projectId))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Load Testing with k6
|
||||
|
||||
```javascript
|
||||
// tests/load/api-load-test.js
|
||||
import http from 'k6/http'
|
||||
import { check, sleep } from 'k6'
|
||||
import { Rate, Trend } from 'k6/metrics'
|
||||
|
||||
const errorRate = new Rate('errors')
|
||||
const taskListDuration = new Trend('task_list_duration')
|
||||
|
||||
export const options = {
|
||||
stages: [
|
||||
{ duration: '30s', target: 10 }, // Ramp up to 10 VUs
|
||||
{ duration: '1m', target: 50 }, // Ramp to 50 VUs
|
||||
{ duration: '2m', target: 50 }, // Sustain 50 VUs
|
||||
{ duration: '30s', target: 100 }, // Spike to 100 VUs
|
||||
{ duration: '1m', target: 50 }, // Back to 50
|
||||
{ duration: '30s', target: 0 }, // Ramp down
|
||||
],
|
||||
thresholds: {
|
||||
http_req_duration: ['p(95)<500'], // 95% of requests < 500ms
|
||||
http_req_duration: ['p(99)<1000'], // 99% < 1s
|
||||
errors: ['rate<0.01'], // Error rate < 1%
|
||||
task_list_duration: ['p(95)<200'], // Task list specifically < 200ms
|
||||
},
|
||||
}
|
||||
|
||||
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000'
|
||||
|
||||
export function setup() {
|
||||
// Get auth token once
|
||||
const loginRes = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
|
||||
email: 'loadtest@example.com',
|
||||
password: 'loadtest123',
|
||||
}), { headers: { 'Content-Type': 'application/json' } })
|
||||
|
||||
return { token: loginRes.json('token') }
|
||||
}
|
||||
|
||||
export default function(data) {
|
||||
const headers = {
|
||||
'Authorization': `Bearer ${data.token}`,
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
// Scenario 1: List tasks
|
||||
const start = Date.now()
|
||||
const listRes = http.get(`${BASE_URL}/api/tasks?limit=20`, { headers })
|
||||
taskListDuration.add(Date.now() - start)
|
||||
|
||||
check(listRes, {
|
||||
'list tasks: status 200': (r) => r.status === 200,
|
||||
'list tasks: has items': (r) => r.json('items') !== undefined,
|
||||
}) || errorRate.add(1)
|
||||
|
||||
sleep(0.5)
|
||||
|
||||
// Scenario 2: Create task
|
||||
const createRes = http.post(
|
||||
`${BASE_URL}/api/tasks`,
|
||||
JSON.stringify({ title: `Load test task ${Date.now()}`, priority: 'medium' }),
|
||||
{ headers }
|
||||
)
|
||||
|
||||
check(createRes, {
|
||||
'create task: status 201': (r) => r.status === 201,
|
||||
}) || errorRate.add(1)
|
||||
|
||||
sleep(1)
|
||||
}
|
||||
|
||||
export function teardown(data) {
|
||||
// Cleanup: delete load test tasks
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Run load test
|
||||
k6 run tests/load/api-load-test.js \
|
||||
--env BASE_URL=https://staging.myapp.com
|
||||
|
||||
# With Grafana output
|
||||
k6 run --out influxdb=http://localhost:8086/k6 tests/load/api-load-test.js
|
||||
```
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user