How it works
Short-term memory is just an array of messages passed back to the LLM each turn. Modern LLMs have huge context windows (Claude Opus 4.7: 1M, GPT-5.5: 1M, Gemini 3.1 Pro: 1M) so most production agents simply keep the full conversation history. When you exceed the window, you summarise the older messages.
Example
A customer service agent's short-term memory is the conversation thread with the user. The agent can reference what was said 5 turns ago because all of it is in the context window.
