Back to glossaryGLOSSARY · Concepts

RAG

Retrieval-Augmented Generation. Pattern where the LLM doesn't answer from its weights alone but retrieves relevant chunks from a knowledge base first, then generates an answer grounded in those chunks. The default architecture for Q&A on private data.

How it works

User query is converted to an embedding, the embedding is used for nearest-neighbor search in a vector database, the top-k chunks are retrieved and inserted into the prompt as context, the LLM generates an answer grounded in those chunks. Often combined with a reranker for higher precision.

Example

An internal Q&A agent for a SaaS company indexes the help center docs, employee handbook, and product specs as embeddings. When an employee asks 'What's our refund policy?', the agent retrieves the relevant policy chunks and grounds its answer in them, citing sources.

Related terms

Need to actually use RAG?

We build production AI systems that put these concepts to work. 30 minutes, we map your use case.