Cost & Performance
Understanding when to use which model—and how to minimize token usage—separates efficient Claude users from those burning through budgets.
Model Selection Guide
Section titled “Model Selection Guide”The Three Models
Section titled “The Three Models”| Model | Speed | Quality | Cost | Use When |
|---|---|---|---|---|
| Haiku | Fastest | Good | Lowest | Simple tasks, high volume |
| Sonnet | Medium | Great | Medium | Default for most work |
| Opus | Slowest | Best | Highest | Complex reasoning, architecture |
Decision Framework
Section titled “Decision Framework”Is this task...
Simple/mechanical?├─ Yes → Haiku│ Examples: formatting, simple refactors, boilerplate│└─ No → Does it require deep reasoning? ├─ Yes → Opus │ Examples: architecture, complex debugging, security review │ └─ No → Sonnet Examples: feature implementation, code review, most tasksSwitching Models
Section titled “Switching Models”# Check current model> /model
# Switch to Haiku for simple work> /model claude-haiku-4-5
# Switch to Sonnet for standard work> /model claude-sonnet-4-5
# Switch to Opus for complex work> /model claude-opus-4-5What Costs Tokens
Section titled “What Costs Tokens”Understanding token consumption helps you optimize:
| Action | Token Cost | Notes |
|---|---|---|
| Your messages | Low | ~1 token per 4 chars |
| Claude’s responses | Medium-High | Verbose by default |
| File reads | High | Full file content |
| MCP tool responses | Variable | Can be massive |
| Context accumulation | Compounds | Grows every message |
The Hidden Costs
Section titled “The Hidden Costs”- Every message includes full context - Message #50 is 50x more expensive than message #1
- File reads stay in context - Reading 10 files = all 10 files in every subsequent request
- MCP responses can be huge - Documentation tools often return 10k+ tokens
Cost Optimization Strategies
Section titled “Cost Optimization Strategies”1. Start Fresh Frequently
Section titled “1. Start Fresh Frequently”# Bad: 100-message session# Each message costs more as context grows
# Good: Multiple focused sessionsclaude "implement the user model"exitclaude "implement the API routes"exitclaude "write the tests"2. Use Subagents for Heavy Lifting
Section titled “2. Use Subagents for Heavy Lifting”# Keep your context clean> use a subagent to research authentication libraries> just give me a summary of the recommendation
# Subagent burns tokens, your context stays lean3. Compress Aggressively
Section titled “3. Compress Aggressively”# Before starting a new phase> /compact
# If context is heavy> /clear> here's where we are: [summary]4. Be Specific to Reduce Iterations
Section titled “4. Be Specific to Reduce Iterations”# Bad: Vague (causes back-and-forth)> add authentication
# Good: Specific (one-shot)> add JWT authentication> - use python-jose> - access tokens expire in 15 mins> - refresh tokens expire in 7 days> - store refresh tokens in Redis5. Model Switching Mid-Session
Section titled “5. Model Switching Mid-Session”# Start with Haiku for exploration> /model claude-haiku-4-5> explain the auth module
# Switch to Sonnet for implementation> /model claude-sonnet-4-5> implement the password reset flowSpeed Optimization
Section titled “Speed Optimization”Factors Affecting Speed
Section titled “Factors Affecting Speed”| Factor | Impact | Fix |
|---|---|---|
| Model choice | High | Use Haiku when possible |
| Context size | High | /compact frequently |
| MCP servers | Medium | Disable unused servers |
| Network | Variable | Check VPN/connection |
Speed vs Quality Tradeoffs
Section titled “Speed vs Quality Tradeoffs”# Maximum speed (simple tasks only)> /model claude-haiku-4-5> [task]
# Balanced (default for most work)> /model claude-sonnet-4-5> [task]
# Maximum quality (complex tasks)> /model claude-opus-4-5> [task]Monitoring Usage
Section titled “Monitoring Usage”Track Spending
Section titled “Track Spending”Use ccusage to monitor your usage:
# Check overall usagenpx ccusage
# Check last 7 daysnpx ccusage --days 7
# Check this monthnpx ccusage --monthEstimate Before Large Tasks
Section titled “Estimate Before Large Tasks”# Before a big task> how many files will this touch?> estimate: how complex is this?
# Then decide model appropriatelyAnti-Patterns
Section titled “Anti-Patterns”❌ Opus for Everything
Section titled “❌ Opus for Everything”# Wasteful> /model claude-opus-4-5> fix this typo❌ Never Compacting
Section titled “❌ Never Compacting”# 200-message session with 50 file reads# Every message now costs 100x what it should❌ Massive MCP Responses
Section titled “❌ Massive MCP Responses”# Pulling entire docs into context> get all React documentation# 50k tokens just entered your context❌ Redundant File Reads
Section titled “❌ Redundant File Reads”# Reading the same file multiple times> read src/main.py> [work]> read src/main.py again# File is already in context!