Question 1

How much does it cost to embed 1 million documents?

Accepted Answer

With text-embedding-3-small ($0.02/1M tokens), embedding 1 million average-length documents (500 words / ~375 tokens each) costs about $7.50. With text-embedding-3-large ($0.13/1M tokens), the same corpus costs about $48.75. Embeddings are a one-time cost — you only pay again when documents change.

Question 2

Which embedding model is most cost-effective?

Accepted Answer

text-embedding-3-small ($0.02/1M tokens) is the best default for most RAG applications — it's cheap, fast and performs well for English text. Only upgrade to text-embedding-3-large if retrieval quality is measurably poor. Gemini Embedding 2 and Mistral Embed are similarly priced to small at $0.02/1M tokens.

Question 3

What is the difference between text-embedding-3-small and text-embedding-3-large?

Accepted Answer

text-embedding-3-large produces higher-dimensional vectors (3072 vs 1536 dimensions) with better semantic accuracy for complex retrieval tasks. It costs about 6.5× more per token. For most use cases, small is sufficient. Use large when you need high-accuracy retrieval on diverse, technical or multi-language content.

Question 4

Is it cheaper to use embeddings or pass the full document as context?

Accepted Answer

Embeddings are a one-time cost — you pay once to vectorize your corpus, then retrieval is essentially free (just a vector similarity search). Passing the full document as context in each LLM call costs every time and scales with your query volume. For more than a few hundred queries per month, a RAG architecture with embeddings is almost always cheaper. Use our File Analyzer to estimate the per-query cost of context-passing vs. the one-time embedding cost.

Embedding Cost Calculator

Frequently asked questions