Back to glossaryGLOSSARY · Models

LLM

Large Language Model. A neural network trained to predict the next token, scaled up to billions or trillions of parameters. The brain in most AI agents and chatbots in 2026. Frontier LLMs in April 2026: Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro.

How it works

An LLM ingests a sequence of tokens (the prompt), runs them through layers of self-attention and feed-forward networks, and outputs a probability distribution over the next token. Sampling produces the next token, which is appended and the process repeats. Modern LLMs use the Transformer architecture introduced in 2017.

Example

When you ask Claude a question, your message is tokenised, the model produces a token-by-token response sampling from its learned distribution, and tool calls are emitted as structured outputs the runtime can execute.

Related terms

Mentioned in

Need to actually use LLM?

We build production AI systems that put these concepts to work. 30 minutes, we map your use case.