How it works
An LLM ingests a sequence of tokens (the prompt), runs them through layers of self-attention and feed-forward networks, and outputs a probability distribution over the next token. Sampling produces the next token, which is appended and the process repeats. Modern LLMs use the Transformer architecture introduced in 2017.
Example
When you ask Claude a question, your message is tokenised, the model produces a token-by-token response sampling from its learned distribution, and tool calls are emitted as structured outputs the runtime can execute.
