Software that uses LLMs to plan, decide, and act over multiple steps, often calling tools to interact with other systems. Different from a chatbot, which only responds to messages inside a conversation. The defining AI architecture of 2025-2026.
Software that takes a goal, uses an LLM to plan how to reach it, calls tools as needed, and continues until the goal is met or it gives up. The unit of work in agentic AI. In 2026, production agents typically run on Claude Opus, GPT-5, or Gemini Pro.
Anthropic's family of LLMs. The current lineup is Claude Opus for maximum capability, Sonnet for balanced cost and performance, and Haiku for speed. Strong on long-context reasoning (1M context), high-resolution vision, and agentic tool use. DK Studio's default model.
The maximum number of tokens an LLM can see at once. Includes the system prompt, conversation history, and retrieved documents. Current frontier models all offer 1M token context (Claude Opus, GPT-5, Gemini Pro).
Open-source multi-agent framework with first-class support for roles, tasks, and processes. Strong when the workflow naturally splits into specialised agents collaborating. 44k+ GitHub stars, hundreds of millions of monthly workflows.
A numeric vector representation of text (or image, or audio) that captures semantic meaning. Two embeddings close in vector space mean two pieces of content close in meaning. Generated by embedding models like text-embedding-3-large, voyage-3, or multilingual-e5.
The practice of measuring whether an LLM output is good. In production, eval sets are curated examples with expected outputs, run on every prompt change to catch regressions. The single most underrated discipline in agent engineering.
Continuing to train a base LLM on your own data so it learns your domain or style. Less common in 2026 than in 2023 because frontier models are good enough out of the box for most use cases. Useful for specialised tasks where prompting is not enough.
A specific implementation of tool use where the LLM emits structured JSON describing which function to call and with what arguments. The runtime then executes the function and feeds the result back to the LLM. The dominant pattern in 2026 frontier-model APIs.
Google's LLM family. Current frontier is Gemini Pro, with a 1M context window and a multimodal-first architecture that handles text, image, video, and audio under one model. Deeply integrated with Google Cloud and Vertex AI.
Generative Pre-trained Transformer. OpenAI's frontier family of LLMs. The current generation is GPT-5, with Thinking and Pro variants for deeper internal reasoning and maximum capability. Each generation has cut hallucination rates further versus the previous one.
When an LLM produces output that sounds confident but is factually wrong. The fundamental challenge of using LLMs for anything with stakes. Mitigated by RAG, citations, and human-in-the-loop review. Each new frontier model generation has cut hallucination rates further, but it has not been eliminated.
Open-source framework for building LLM applications. Provides chains, agents, retrievers, and integrations. Once dominant in 2023-2024, now legacy in 2026. Production teams use LangGraph (LangChain's successor) or alternative frameworks.
LangChain's stateful agent framework. Hit GA in late 2025 and now widely used in production. Builds agents as graphs with explicit state and transitions. The current default for production multi-step agents, with LangSmith integration for observability.
Large Language Model. A neural network trained to predict the next token, scaled up to billions or trillions of parameters. The brain in most AI agents and chatbots in 2026. Frontier LLMs today: Claude Opus, GPT-5, Gemini Pro.
External storage of facts, preferences, or events that persists across sessions. Usually in a vector database or structured store. The agent retrieves relevant memories at the start of each turn.
The conversation history kept inside an LLM's context window during a single session. Resets when the conversation ends. The default kind of memory in most chatbots and agents.
Architecture where multiple specialised AI agents collaborate to handle a workflow. One agent plans, another researches, another writes, another reviews. More complex than single-agent but powerful when the task naturally splits.
Open-source workflow automation tool. Self-hostable, with 400+ integrations. The orchestration layer DK Studio uses for ~70% of automation builds.
The ability to see what an LLM or agent did and why. Includes logging prompts, tool calls, costs, and latencies. Critical for debugging non-deterministic systems. Current standards: Langfuse, LangSmith, Helicone, plus OpenTelemetry GenAI semantic conventions for trace standardisation.
An agent's ability to break a goal into sub-tasks and decide the order. The hardest thing for LLMs to do reliably; the difference between a working agent and one that loops.
The practice of crafting prompts to get better outputs from LLMs. Includes structuring instructions, providing examples (few-shot), and chaining prompts. The cheap optimisation before fine-tuning. Modern frontier models reduce the need for elaborate prompting but it remains relevant.
Retrieval-Augmented Generation. Pattern where the LLM doesn't answer from its weights alone but retrieves relevant chunks from a knowledge base first, then generates an answer grounded in those chunks. The default architecture for Q&A on private data.
The process of an LLM working through a problem step by step before answering. Modern frontier models reason internally; some expose the reasoning trace explicitly (Claude's thinking blocks, GPT's Thinking variant).
Instructions given to an LLM before any user input. Defines the model's role, constraints, and tone. Usually invisible to the end user. The most important and most underrated optimisation in agent engineering.
A sampling parameter that controls randomness in LLM output. Temperature 0 = deterministic (same input always gives same output). Higher temperatures = more creative, less predictable. Default is usually 0.7. Some newer Claude generations have moved to adaptive sampling and no longer expose temperature as a knob.
The unit of input and output for an LLM. A token is roughly 3-4 characters in English (e.g. "hello" is one token, "hellos" is two). Models charge per input and output token. Different model families use different tokenizers, so the same text can produce a different token count across providers.
When an LLM calls external functions or APIs as part of its response. The mechanism that turns an LLM from a text generator into an agent that can act. Also called function calling.
Another sampling parameter, also called nucleus sampling. Limits the model to picking from the smallest set of tokens whose cumulative probability is at least p. Used alongside temperature to shape output distribution.
Database that stores embeddings (numeric vectors) and supports nearest-neighbor search. Used in RAG to find documents similar in meaning to a query. Current production options: Pinecone, Weaviate, pgvector + pgvectorscale, Qdrant, Turbopuffer, Cloudflare Vectorize.