How it works
After the LLM produces a probability distribution over the next token, temperature scales the distribution before sampling. Temperature 0 picks the highest-probability token deterministically. Higher temperatures flatten the distribution, making lower-probability tokens more likely.
Example
For a customer service agent where consistency matters, temperature 0.2 produces nearly the same response to the same question every time. For a creative writing agent, temperature 0.9 produces variation across calls.
