How accurate are the token counts?

The tool uses character-based tokenization approximations tuned per provider. OpenAI models use ~4 characters/token (±3%), Anthropic uses ~3.8 (±5%), and other providers are similar. Exact tokenizer implementations are proprietary for most providers, so estimates may differ slightly from actual API counts, especially for non-English text or code. Every count shows an accuracy badge so you always know the confidence level.

Does my text get sent to any server?

No. All tokenization runs entirely in your browser using JavaScript. Your prompt text never leaves your device. The tool makes no API calls — to LLM providers or otherwise. You can verify this by opening DevTools > Network while using the tool.

What is the difference between cached input and batch API pricing?

Cached input pricing applies when you reuse the same context prefix across requests — the provider stores the computation and charges a fraction of the standard input price for cache hits. Batch API pricing applies when you submit requests asynchronously in bulk (results returned within 24 hours), trading latency for cost. Both discounts typically offer ~50–90% savings but serve different use cases.

How do I calculate costs for reasoning models like o3 or Claude extended thinking?

Open the Advanced options panel and enter an estimate of reasoning tokens in the Reasoning tokens field. These are internal chain-of-thought tokens billed at the output rate that are not included in the visible response. For o3, a complex task can generate 10,000–50,000 reasoning tokens. Models that don't support extended reasoning show no additional cost.

Are these prices up to date?

Prices were last manually verified in April 2026 against each provider's official pricing page. A staleness warning appears if the data is more than 45 days old. If you notice an incorrect price, use the "Report incorrect pricing" link to flag it.

Can I compare costs for a specific provider only?

Yes — use the provider filter chips above the table to show only the providers you care about. You can select multiple providers simultaneously. The savings callout updates to reflect the filtered view.

LLM Cost Calculator

Paste your prompt, set expected output length, and instantly compare token counts and API costs across GPT-5, Claude Opus 4.7, Gemini 3, Llama 4, DeepSeek, Grok, and Mistral — all in your browser.

Last updated: April 2026

What it does

All major 2026 models

GPT-5, GPT-5 mini, o3, o4-mini, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5, Gemini 3 Pro, Gemini 3 Flash, Llama 4 Scout, DeepSeek V3, DeepSeek R1, Grok 4, and Mistral Large 2 — 16 models in one table.

Cached input & batch pricing

Toggle cached input discounts (up to 90% off on Anthropic) or batch API discounts (~50% off on OpenAI and Anthropic) to see realistic costs for production workloads.

Reasoning token costs

For o3, o4-mini, and Claude extended thinking, add a reasoning token count to see the true cost of chain-of-thought inference.

Volume scaling

Set request volume from 1 to 10M/month. The table updates to show monthly spend so you can project costs before signing any enterprise agreement.

Privacy-first design

All tokenization runs locally in your browser using character-based approximations. Your prompts are never sent to any server.

How to use LLM Cost Calculator

1
Paste or type your prompt
Enter the text you want to tokenize into the input area. You can paste from clipboard, upload a .txt / .md / .json / .csv file, or load a sample prompt.
2
Set expected output length
Choose a fixed output token count (100, 500, 1K, 4K, or custom) or set output as a ratio of input. Output tokens cost 3–10× more than input — getting this right matters.
3
Review the comparison table
The table shows input tokens, output tokens, and cost per request for every model. Sort by any column. Filter to specific providers. The cheapest model is highlighted.
4
Tune advanced options
Open Advanced options to apply cached input discounts, batch API pricing, reasoning token costs, or volume scaling. The table updates live.
5
Share or export
Click Share link to copy a URL with your current settings (not your text). Copy the results as a Markdown table or CSV for documentation.

When to use this

Picking a provider before starting a project

A developer pasting their system prompt + few-shot examples to see whether GPT-5 or Claude Haiku 4.5 is cheaper for their specific use case.

Estimating monthly LLM spend for a product launch

A founder setting volume to 100K requests/month and comparing Gemini 3 Flash vs DeepSeek V3 to decide which fits their runway.

Justifying a provider switch to engineering leadership

An engineer sharing a pre-filled link showing that switching from Claude Opus 4.7 to Claude Sonnet 4.6 saves $X per million requests.

Evaluating caching ROI before implementation

A team toggling the cached input toggle to see whether the 10× cache discount on Anthropic justifies the engineering effort.

Technical details

Tokenization method	Character-based BPE approximation per provider (±3–8% accuracy)
Data source	Official provider pricing pages, verified April 2026
Models covered	16 models across OpenAI, Anthropic, Google, Meta, DeepSeek, xAI, Mistral
Privacy	Zero server-side processing — all computation runs in the browser
URL encoding	Settings encoded to URL params (text is never included)

Why output tokens cost 3–10× more than input

LLM inference has two distinct computational phases: prefill and decode. During prefill, the model processes your entire input in parallel — a relatively cheap operation that scales with context length but benefits from hardware parallelism. During decode, the model generates each output token one at a time in an autoregressive loop, which is both slower and more memory-intensive per token.

This asymmetry directly drives pricing. For GPT-5, input is $2.50/M tokens while output is $10.00/M — a 4× multiplier. For Claude Opus 4.7, the ratio is 5× ($15 input vs $75 output). The practical implication: a seemingly short 200-token answer to a long 2,000-token prompt still costs more in output than input. Most cost estimation mistakes come from assuming equal input/output costs.

When prompt caching actually saves money

Prompt caching stores the KV cache for a prefix of your context window. On subsequent requests with the same prefix, the model skips re-computing that prefix and reads the cached activations instead. Anthropic charges $1.50/M tokens to write a cache entry and only $0.15/M to read it — a 10× discount compared to uncached input at $15/M.

The math only works if your cache hit rate is high. For an application with a fixed system prompt sent on every request, your cache hit rate approaches 100% and caching is a no-brainer. For conversational apps where context grows with each turn, the hit rate decreases as conversations diverge. Run the numbers: cache write cost × expected writes + cache read cost × expected reads vs standard input cost × total requests. Break-even is typically reached at just 2–3 repeated requests with the same large context.

The hidden cost of reasoning models

OpenAI's o-series and Anthropic's extended thinking modes generate "reasoning tokens" — an internal chain-of-thought scratchpad that the model uses before producing its final answer. These reasoning tokens are billed at the same rate as output tokens (or higher), but they are never shown to the user.

For o3, reasoning tokens can be 5–20× the length of the visible output on complex problems. A task that appears to cost $0.004 in output tokens may actually cost $0.04–$0.08 when reasoning tokens are included. The LLM Cost Calculator lets you add a reasoning token estimate to see the true all-in cost, which is essential for budget planning with these models.

Frequently Asked Questions

How accurate are the token counts?: The tool uses character-based tokenization approximations tuned per provider. OpenAI models use ~4 characters/token (±3%), Anthropic uses ~3.8 (±5%), and other providers are similar. Exact tokenizer implementations are proprietary for most providers, so estimates may differ slightly from actual API counts, especially for non-English text or code. Every count shows an accuracy badge so you always know the confidence level.
Does my text get sent to any server?: No. All tokenization runs entirely in your browser using JavaScript. Your prompt text never leaves your device. The tool makes no API calls — to LLM providers or otherwise. You can verify this by opening DevTools > Network while using the tool.
What is the difference between cached input and batch API pricing?: Cached input pricing applies when you reuse the same context prefix across requests — the provider stores the computation and charges a fraction of the standard input price for cache hits. Batch API pricing applies when you submit requests asynchronously in bulk (results returned within 24 hours), trading latency for cost. Both discounts typically offer ~50–90% savings but serve different use cases.
How do I calculate costs for reasoning models like o3 or Claude extended thinking?: Open the Advanced options panel and enter an estimate of reasoning tokens in the Reasoning tokens field. These are internal chain-of-thought tokens billed at the output rate that are not included in the visible response. For o3, a complex task can generate 10,000–50,000 reasoning tokens. Models that don't support extended reasoning show no additional cost.
Are these prices up to date?: Prices were last manually verified in April 2026 against each provider's official pricing page. A staleness warning appears if the data is more than 45 days old. If you notice an incorrect price, use the "Report incorrect pricing" link to flag it.
Can I compare costs for a specific provider only?: Yes — use the provider filter chips above the table to show only the providers you care about. You can select multiple providers simultaneously. The savings callout updates to reflect the filtered view.

Related Tools

calculate

RAG Cost Calculator

End-to-end RAG pipeline cost — Embedding · Storage · Retrieval · Generation. Compare 17 embedding models × 10 vector DBs × 20+ LLMs in one view, with the math shown for every layer.

database

Vector Database Cost Calculator

Compare monthly cost across Pinecone, Weaviate, Qdrant, Milvus, Chroma, MongoDB Atlas, and self-hosted pgvector. Quantization, replication, and DevOps overhead all baked in.

memory

GPU VRAM Calculator for LLMs

Estimate GPU VRAM for running or fine-tuning open LLMs — Llama, Mistral, Qwen, DeepSeek, Gemma, Phi. Inference, LoRA, QLoRA, full FT, every quantization level, with the math shown.

description

Word Counter

Real-time character, word, and paragraph analysis.

code

JSON Formatter

Clean, minify, and validate JSON data structures.