How it works
All input messages must fit in the context window. When the conversation gets long, you either summarise older messages or use long-term memory retrieval. Cost scales linearly with context, so filling a 1M-token window costs orders of magnitude more than a typical short prompt.
Example
A document analysis agent processing a 200-page legal contract can fit the entire contract (~140K tokens) in the context window plus the system prompt and analysis instructions, with room left over for the response.
