Article · 12 minute read

What does an AI agent actually cost to run?

LLM APIs, hosting, messaging, vector databases, and the line item everyone forgets. Real numbers you can put in a spreadsheet.

Benjam Indrenius

Benjam Indrenius

Published 2026-04-26

The short answer

Most business operators overestimate model cost and underestimate everything else. A typical AI conversation costs under a nickel. A single SMS in some countries costs more than the model call. And the person maintaining the system costs more than the hosting, the API, and the messaging combined.

What a conversation actually costs on each model

A typical business chat: 6,000 input tokens, 1,500 output tokens. That covers a multi-turn support Q&A, internal ops lookup, or scheduling conversation. Prices are from current provider pricing pages (Claude).

ModelInput $/1MOutput $/1MPer chat
GPT-4o-mini$0.15$0.60$0.002
Gemini Flash$0.30$2.50$0.006
Claude Haiku 4.5$1.00$5.00$0.014
GPT-4o$2.50$10.00$0.030
Claude Sonnet 4.5$3.00$15.00$0.041
Claude Opus 4$15.00$75.00$0.203

1,000 conversations on GPT-4o-mini: $1.80. On Claude Haiku: $13.50. On GPT-4o: $30. The model is rarely the biggest line item. The channel and the human time usually are.

Messaging costs: SMS vs WhatsApp vs email

A 5-message conversation (3 outbound, 2 inbound) on Twilio, by country. This is where the geography of your customers starts to matter more than your model choice.

US SMS

$0.04

UK SMS

$0.18

Finland SMS

$0.27

Germany SMS

$0.35

WhatsApp service

$0.025

5 messages via Twilio

Email (SES)

$0.0004

4-email thread

Email (Resend)

$0.002

4-email thread

In Europe, WhatsApp service conversations are 7-14x cheaper than SMS. Email is three orders of magnitude cheaper than either. If your customers will accept WhatsApp, the savings are large.

Hosting: cheaper than you think

App runtime for an agent that calls external APIs. Not including the LLM, messaging, or database.

ProviderBasicMediumHigh volume
Hetzner$5$16$38
DigitalOcean$6$24$48
Fly.io$6$23$46
Render$7$25$85
Railway$30$80$160

A normal business agent runs on a $5-25/month VPS if the intelligence lives in external APIs. What gets expensive is managed databases, caches, observability, and multiple workers.

Three real scenarios

Solo operator, SMS notification agent

500 US notifications/mo, 50 replies, GPT-4o-mini, Hetzner, 2 hrs/mo maintenance

Hosting$5LLM API$0.21Twilio SMS + number$5.72Backups + domain$1.67Human maintenance (2 hrs @ $90)$180Total$193/mo

Machine bill: $12.60. The rest is your time.

Small team, Slack internal agent

8 users, 500 chats/mo, 80% mini + 20% GPT-4o, Render, Supabase, Sentry, 4 hrs/mo

Slack Pro (8 seats)$70Render runtime$25Supabase Pro$25Mixed model API$3.72Sentry Team$26Human maintenance (4 hrs @ $90)$360Total$511/mo

The model bill is $3.72. Slack seats cost 19x more than the AI.

Service business, customer-facing WhatsApp + SMS

500 conversations/mo (350 WhatsApp, 150 US SMS), Claude Sonnet 4.5, Fly.io, 8 hrs/mo

Fly.io runtime$23Supabase Pro$25Claude Sonnet 4.5 API$27WhatsApp + SMS messaging$17.32Sentry Team$26Human maintenance (8 hrs @ $90)$720Total$840/mo

500+ conversations/month and the model bill is still only $27. The supervision time dominates.

Four ways to cut costs that actually work

Route by complexity

Send 80% of requests to GPT-4o-mini ($0.002/chat) and 20% to GPT-4o ($0.03/chat). 1,000 conversations: $7.44 instead of $30. Same trick with Anthropic: 80% Haiku + 20% Sonnet cuts costs 53%.

Cache your prompts

Anthropic's cache-read pricing on Sonnet 4.5: $0.30/MTok vs $3/MTok standard. A repeated 8,000-token system prompt drops from $0.024 to $0.0024 per request. Over 10,000 conversations, that saves $216/month on a single prompt block.

Batch non-urgent work

Google offers 50% off through the Gemini Batch API. Overnight classification, document cleanup, summarization, enrichment jobs. If it doesn't need a real-time answer, batch it.

Pick the cheapest viable channel

1,000 US SMS notifications: $8.30. 1,000 SES emails: $0.10. In Germany, that same 1,000 SMS would cost $112. If your customers accept WhatsApp or email for non-urgent messages, the savings stack up fast.

Frequently asked questions

How much does it cost to run an AI agent per conversation?

A typical business conversation (6,000 input tokens, 1,500 output) costs $0.002 on GPT-4o-mini, $0.03 on GPT-4o, $0.014 on Claude Haiku 4.5, and $0.04 on Claude Sonnet 4.5. The model is rarely the biggest line item. Messaging and human maintenance usually cost more.

What is the biggest hidden cost of running an AI agent?

Human maintenance time. At a $90/hour loaded engineering rate, 4 hours per month of upkeep costs $360. That often exceeds the combined bill for hosting, APIs, and messaging. The Bureau of Labor Statistics puts median US developer salary at $133,080 (May 2024).

Is it cheaper to self-host an LLM or use an API?

API is cheaper for low volume. Self-hosted Llama on a Hetzner GPU server costs roughly $1.20 to $2.40 per million output tokens depending on throughput, but the GPU idles when traffic is low. At under 10,000 conversations per month, API pricing usually wins because you pay per token, not per hour.

How much does SMS cost for lead notifications?

US SMS on Twilio: $0.0083 per segment. A 5-message conversation (3 out, 2 in): about $0.04. UK SMS: $0.18 per conversation. Germany: $0.35. Finland: $0.27. WhatsApp service messages through Twilio cost about $0.005 each, making WhatsApp cheaper than SMS in most of Europe.

Should I build my own AI agent or buy a SaaS product?

Buy first when the workflow is standard and time-to-value matters. A solo SMS notification agent costs roughly $1,440 to build and $180/month to maintain. A customer-facing multi-channel agent: $10,800 to build, $720/month to maintain. The model API itself is often the smallest line item.

How can I reduce AI agent costs?

Route 80% of requests to a cheap model (GPT-4o-mini or Claude Haiku) and 20% to a strong model. This cuts model costs 50-75%. Use prompt caching (Anthropic cache reads are 90% cheaper). Batch non-urgent work (Google offers 50% batch discount). Replace SMS with email or WhatsApp where possible.

Stop losing leads.

Five minutes to install. First lead lands on your phone.