How much does the ChatGPT API cost?

OpenAI's GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens. GPT-4o-mini is much cheaper at $0.15 per 1M input tokens and $0.60 per 1M output tokens.

What is the cheapest LLM API?

Budget-tier models like GPT-4o-mini, Gemini 2.0 Flash, and Llama 3.1 8B (via providers like Together AI) offer the lowest per-token pricing, often under $0.20 per 1M input tokens.

How is LLM API pricing calculated?

LLM APIs charge per token, with separate rates for input (prompt) tokens and output (completion) tokens. Your monthly cost equals (input tokens x input price) + (output tokens x output price). Most providers quote prices per 1 million tokens.

LLM API Pricing Calculator - Compare ChatGPT, Claude, Gemini Costs

Cost Calculator

Prompt Tokens / Month 1.0M tokens

Completion Tokens / Month 500K tokens

Use Case Presets:

Model Tier:

Monthly Cost Comparison

Sort by:

Provider	Model	Input $/1M	Output $/1M	Monthly Cost	Context	Speed	Tier	Try It

Visual Comparison

How to Use the LLM Pricing Calculator

This calculator is for developers and product teams who are integrating large language model APIs into their applications and need to forecast costs before committing to a provider. It covers all major providers -- OpenAI, Anthropic, Google, Mistral, Meta, and Cohere -- so you can compare apples to apples across the entire market.

Estimating Your Monthly Cost

Enter your expected monthly token volumes in the two input fields: prompt (input) tokens and completion (output) tokens. If you are unsure, use the preset buttons -- Chatbot, Code Assistant, Document Processing, and RAG Pipeline -- which populate realistic token volumes for each use case. The results table instantly recalculates monthly costs for every model in the database.

Understanding Token Pricing

LLM APIs charge per token, where a token is roughly three-quarters of a word in English. Providers quote prices per 1 million tokens and charge separately for input and output. Output tokens typically cost 2-5x more than input tokens because they require more computation (the model generates them one at a time). When budgeting, pay close attention to your input-to-output ratio -- a summarization app that produces short outputs from long documents is input-heavy, while a code generation tool produces verbose output from short prompts.

Choosing the Right Model Tier

Use the tier filter buttons to narrow results. Flagship models deliver the highest quality reasoning and nuanced output but cost 10-50x more than budget models. For high-volume tasks like classification, entity extraction, or simple Q&A, budget-tier models often perform just as well at a fraction of the cost. A common production pattern is to route easy queries to a cheap model and escalate only complex ones to a flagship.

Cost Optimization Strategies

Shorten your prompts. Every extra word in your system prompt costs input tokens on every single request.
Use prompt caching (available from Anthropic and OpenAI) to get 50-90% discounts on repeated prompt prefixes.
Batch non-urgent requests using batch APIs for roughly 50% savings.
Monitor your actual token usage weekly. Many teams discover their real costs differ significantly from initial estimates.

Frequently Asked Questions

How do I count tokens before sending a request?

OpenAI provides the tiktoken library for Python and JavaScript that counts tokens for GPT models. Anthropic's API returns token counts in the response headers. For rough estimation, divide your text's character count by 4 to approximate the token count in English.

Why are output tokens more expensive than input tokens?

Generating output tokens requires the model to run its full forward pass sequentially, one token at a time. Input tokens can be processed in parallel as a batch. This difference in computational cost is why every provider charges a premium for output.

What does "context window" mean for pricing?

The context window is the maximum number of tokens (input plus output combined) a model can process in a single request. A larger context window lets you include more information per request but increases your input token costs. You pay for every token you send, so stuffing a 128K context window with unnecessary text can be very expensive.

Are these prices up to date?

Prices are sourced from official provider documentation and updated periodically. LLM pricing changes frequently -- providers often lower prices or release new tiers. Always verify current pricing on the provider's website before making budgeting decisions.

Should I use open-source models to save money?

Open-source models like Llama can be self-hosted or accessed through inference providers (Together AI, Replicate, etc.) at lower per-token costs. However, self-hosting adds infrastructure complexity and GPU costs. Hosted open-source APIs through providers like Together AI offer a middle ground -- cheaper than proprietary APIs with no infrastructure management.

LLM API Pricing Calculator

Cost Calculator

Monthly Cost Comparison

Visual Comparison

How LLM API Pricing Works

Token-Based Pricing

What Affects Your Cost

Choosing the Right Model

Cost Optimization Tips