Back to glossaryGLOSSARY · Concepts

Token

The unit of input and output for an LLM. A token is roughly 3-4 characters in English (e.g. "hello" is one token, "hellos" is two). Models charge per input and output token. Different model families use different tokenizers, so the same text can produce a different token count across providers.

How it works

Tokenisation breaks text into sub-word units using a vocabulary the model was trained on. Common tokens (the, of, ing) are single tokens; rare or specialised words may take multiple tokens. Pricing scales with token count, so efficient prompting matters at scale.

Example

A 1,000-word page is roughly 1,300-1,500 tokens. At Claude Opus's input rate, that runs well under a cent per page processed.

Related terms

Need to actually use Token?

We build production AI systems that put these concepts to work. 30 minutes, we map your use case.