Back to glossaryGLOSSARY · Concepts

Context Window

The maximum number of tokens an LLM can see at once. Includes the system prompt, conversation history, and retrieved documents. Current frontier models all offer 1M token context (Claude Opus, GPT-5, Gemini Pro).

How it works

All input messages must fit in the context window. When the conversation gets long, you either summarise older messages or use long-term memory retrieval. Cost scales linearly with context, so filling a 1M-token window costs orders of magnitude more than a typical short prompt.

Example

A document analysis agent processing a 200-page legal contract can fit the entire contract (~140K tokens) in the context window plus the system prompt and analysis instructions, with room left over for the response.

Related terms

Need to actually use Context Window?

We build production AI systems that put these concepts to work. 30 minutes, we map your use case.