Skip to content

Cost & Performance

Understanding when to use which model—and how to minimize token usage—separates efficient Claude users from those burning through budgets.

ModelSpeedQualityCostUse When
HaikuFastestGoodLowestSimple tasks, high volume
SonnetMediumGreatMediumDefault for most work
OpusSlowestBestHighestComplex reasoning, architecture
Is this task...
Simple/mechanical?
├─ Yes → Haiku
│ Examples: formatting, simple refactors, boilerplate
└─ No → Does it require deep reasoning?
├─ Yes → Opus
│ Examples: architecture, complex debugging, security review
└─ No → Sonnet
Examples: feature implementation, code review, most tasks
Terminal window
# Check current model
> /model
# Switch to Haiku for simple work
> /model claude-haiku-4-5
# Switch to Sonnet for standard work
> /model claude-sonnet-4-5
# Switch to Opus for complex work
> /model claude-opus-4-5

Understanding token consumption helps you optimize:

ActionToken CostNotes
Your messagesLow~1 token per 4 chars
Claude’s responsesMedium-HighVerbose by default
File readsHighFull file content
MCP tool responsesVariableCan be massive
Context accumulationCompoundsGrows every message
  1. Every message includes full context - Message #50 is 50x more expensive than message #1
  2. File reads stay in context - Reading 10 files = all 10 files in every subsequent request
  3. MCP responses can be huge - Documentation tools often return 10k+ tokens
Terminal window
# Bad: 100-message session
# Each message costs more as context grows
# Good: Multiple focused sessions
claude "implement the user model"
exit
claude "implement the API routes"
exit
claude "write the tests"
# Keep your context clean
> use a subagent to research authentication libraries
> just give me a summary of the recommendation
# Subagent burns tokens, your context stays lean
# Before starting a new phase
> /compact
# If context is heavy
> /clear
> here's where we are: [summary]
# Bad: Vague (causes back-and-forth)
> add authentication
# Good: Specific (one-shot)
> add JWT authentication
> - use python-jose
> - access tokens expire in 15 mins
> - refresh tokens expire in 7 days
> - store refresh tokens in Redis
# Start with Haiku for exploration
> /model claude-haiku-4-5
> explain the auth module
# Switch to Sonnet for implementation
> /model claude-sonnet-4-5
> implement the password reset flow
FactorImpactFix
Model choiceHighUse Haiku when possible
Context sizeHigh/compact frequently
MCP serversMediumDisable unused servers
NetworkVariableCheck VPN/connection
# Maximum speed (simple tasks only)
> /model claude-haiku-4-5
> [task]
# Balanced (default for most work)
> /model claude-sonnet-4-5
> [task]
# Maximum quality (complex tasks)
> /model claude-opus-4-5
> [task]

Use ccusage to monitor your usage:

Terminal window
# Check overall usage
npx ccusage
# Check last 7 days
npx ccusage --days 7
# Check this month
npx ccusage --month
# Before a big task
> how many files will this touch?
> estimate: how complex is this?
# Then decide model appropriately
# Wasteful
> /model claude-opus-4-5
> fix this typo
# 200-message session with 50 file reads
# Every message now costs 100x what it should
# Pulling entire docs into context
> get all React documentation
# 50k tokens just entered your context
# Reading the same file multiple times
> read src/main.py
> [work]
> read src/main.py again
# File is already in context!