How it works
User query is converted to an embedding, the embedding is used for nearest-neighbor search in a vector database, the top-k chunks are retrieved and inserted into the prompt as context, the LLM generates an answer grounded in those chunks. Often combined with a reranker for higher precision.
Example
An internal Q&A agent for a SaaS company indexes the help center docs, employee handbook, and product specs as embeddings. When an employee asks 'What's our refund policy?', the agent retrieves the relevant policy chunks and grounds its answer in them, citing sources.
