What is Context window?

The context window is the maximum number of tokens an LLM can consider in a single request — including the system prompt, conversation history, retrieved documents, and the user's current question. Anything beyond the limit must be truncated, summarized, or otherwise handled before being sent to the model.

Also known as: context length, token window, input window

Why it's measured in tokens, not words

LLMs operate on tokens, which are subword units produced by the model's tokenizer. A token can be a whole short word, a fragment of a long word, a single character, or a punctuation mark. As a rough rule, English text is about 1 token per 0.75 words (so 1000 tokens ≈ 750 words). Other languages — especially CJK and non-Latin scripts — tokenize less efficiently, sometimes 2-4 tokens per word.

Context window sizes in 2026

Common ranges: 4K-32K tokens for small/cheap models, 128K for mainstream models, 200K-1M for premium tiers. vMira's Pro tier supports 1M-token context. Larger context windows let you paste entire codebases, long PDFs, or many documents at once without RAG-style chunking.

Larger isn't always better

A long context window doesn't mean the model attends equally to every token. Recent research (the "lost in the middle" effect) shows accuracy on retrieval-style tasks drops sharply for information in the middle of a long context. Models are usually best at finding facts near the start or end. Putting the most important context first or last, or using RAG to feed in only relevant chunks, often beats stuffing everything into a giant prompt.

Cost implications

Input tokens are typically cheaper than output tokens, but a 100K-token input request still costs 50× a 2K-token request. Long-context workflows can quickly become expensive. Strategies to manage cost: summarize earlier conversation turns, use RAG for selective retrieval, cache the static portion of the prompt (prompt caching, supported by most major providers as of 2026), and pick the smallest model that handles the task.

Last updated 2026-05-18 · First published 2026-05-18

What is Context window?

Why it's measured in tokens, not words

Context window sizes in 2026

Larger isn't always better

Cost implications

Related terms

Large language model (LLM)

Retrieval-Augmented Generation (RAG)

Thinking mode

Embedding

Try Context window in vMira