Question 1

Why is my AI agent API cost so much higher than expected?

Accepted Answer

In an agent loop, every step re-sends the entire conversation history as input tokens. By step 4, you are paying to send the original system prompt, all tool definitions, the user message, AND the outputs of all previous steps — again. This context accumulation is the main hidden cost. A 4-step agent with moderate tool use typically costs 5–10× more than a single 1-shot call.

Question 2

What is context accumulation in AI agents?

Accepted Answer

Context accumulation means that each LLM call in an agent loop re-sends the full conversation history (system prompt + all previous messages + tool results) as input tokens. The input grows linearly with each step: input at step N = baseContext + (N-1) × (outputPerStep + toolResultTokens). This makes agents inherently more expensive than simple chat calls.

Question 3

How much does prompt caching save for AI agents?

Accepted Answer

With Anthropic's Claude, prompt caching reads the static portion of your context (system prompt + tool definitions) at 0.1× the normal input price — a 90% discount. For agents with a large system prompt and many tool definitions, caching across steps can cut total agent cost by 40–70%, depending on the number of steps and the ratio of static to dynamic context.

Question 4

How many tokens do tool schemas use?

Accepted Answer

A typical tool definition with a name, description and 3–5 parameters takes roughly 80–150 tokens in JSON schema format. An agent with 5 tools typically adds 400–750 tokens of static tool schema context to every step. With prompt caching enabled, this overhead is effectively eliminated from step 2 onwards.

AI Agent API Cost Calculator

Frequently asked questions