AI Agent API Cost Calculator

Upload your agent code for auto-detection, or fill in manually — context accumulation, tool calls and prompt caching modeled in real time.

Presets:
📁

Drop your agent file here

Python · JavaScript · JSON · YAML — or paste code below

Upload agent code or fill in the form
to see real-time cost estimates here.

Why agents cost more than chat: Every step re-sends the full conversation history as input tokens. A 4-step agent with moderate tool use typically costs 5–10× more than a single 1-shot call — context accumulation is the hidden cost most developers miss.

Frequently asked questions

Why is my AI agent API cost so much higher than expected?

Context accumulation is the main culprit. Every agent step re-sends the full conversation history (system prompt + all tool definitions + all previous messages + tool results) as input tokens. By step 4, you might be paying for 2,000+ tokens of input when the original user message was only 50 words. Our calculator models this exactly and shows you the step-by-step growth.

What is context accumulation and why does it matter?

Context accumulation is the pattern where each LLM call in an agent loop re-sends the growing conversation history. The formula is: input_tokens(step_n) = base_context + (n-1) × (output_per_step + tool_result_tokens). This means input costs grow roughly quadratically with step count — step 6 can easily cost 3× what step 2 costs.

How much does prompt caching save for AI agents?

With Anthropic's Claude, cached tokens are billed at 0.1× the normal input price (90% discount). For agents with a 200-word system prompt and 5 tool definitions, caching that static portion across all steps typically cuts total agent cost by 40–70%. Enable the caching toggle to see your exact savings.

How many tokens do tool schemas use?

A typical tool definition (name + description + 3–5 parameters) takes roughly 80–150 tokens in JSON schema format. An agent with 5 tools adds around 400–750 tokens of static context overhead to every single step. With prompt caching this overhead is effectively free from step 2 onwards.