Glossary

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is an AI technique that fetches relevant text chunks from an external knowledge source at query time and includes them in an LLM prompt. RAG combines a retrieval system (search over your data) with a generation model (LLM that produces answers) to produce responses grounded in specific data instead of relying on training data alone.

Last updated:

Origins

Where the term comes from

RAG was introduced in a 2020 research paper by Lewis et al. at Facebook AI Research. The technique gained mainstream adoption in 2023-2024 as a way to give LLMs access to up-to-date, domain-specific, or proprietary information without fine-tuning the model itself.

Capabilities

What Retrieval-Augmented Generation (RAG) does

Retrieves relevant context per query

When a user asks a question, RAG finds relevant chunks of text from your knowledge source and includes them in the LLM prompt.

Grounds answers in source data

The LLM produces answers based on the retrieved chunks, reducing hallucination and making outputs traceable to specific sources.

Supports up-to-date information

Because the knowledge source is queried at runtime, RAG answers can reflect data that came in after the LLM was trained.

Avoids fine-tuning

RAG gives an LLM access to your data without modifying the model weights, which is faster, cheaper, and easier to update.

Distinctions

Retrieval-Augmented Generation (RAG) vs adjacent concepts

Retrieval-Augmented Generation (RAG) is often confused with related but distinct ideas. Here is how it differs.

ConceptWhat it isHow Retrieval-Augmented Generation (RAG) differs
Context OSA retrieval pattern. RAG is how you pull context into a prompt.The source of what gets retrieved. A Context OS is the structured, source-grounded knowledge layer; RAG is one technique for querying it.
Fine-tuningModifies the LLM weights to bake in domain knowledge.Leaves the LLM unchanged and supplies context at query time. Faster to update, easier to audit.
Prompt engineeringManually crafted, one-shot context per prompt.Systematic, retrieval-driven context selection per query.

Who uses it

Who uses Retrieval-Augmented Generation (RAG)

Almost every production AI system that needs to ground answers in specific data uses RAG in some form. Customer support bots, internal knowledge tools, code assistants, and Context OS implementations like DearTech-OS all rely on RAG patterns to deliver grounded answers.

FAQ

Common questions about Retrieval-Augmented Generation (RAG)

How does RAG work?

When a user submits a query, the RAG system uses search (typically vector search, full-text search, or graph traversal) to retrieve relevant chunks from a knowledge source. Those chunks are included in the LLM prompt as context. The LLM then generates an answer grounded in the retrieved context.

Is a Context OS the same as RAG?

No. A Context OS is the structured knowledge layer that gets queried. RAG is a technique for querying it. A Context OS makes RAG more effective by adding types, status, confidence, and graph relationships that improve retrieval quality.

What is the difference between RAG and a knowledge graph?

A knowledge graph is a data structure (typed nodes and relationships). RAG is a retrieval pattern (fetching context for a prompt). They work well together: RAG can retrieve graph nodes, and graph traversal can expand the retrieval beyond pure semantic similarity.

See Retrieval-Augmented Generation (RAG) in practice

DearTech-OS is a Context OS for founder-operators. Explore the product or talk through whether one is right for your team.