What is RAG (Retrieval-Augmented Generation)?
Turkish: RAG
RAG is an AI architecture where a language model retrieves relevant passages from documents or databases before generating an answer.
What is RAG?
RAG (Retrieval-Augmented Generation) lets a large language model use relevant passages from an external knowledge source before it writes an answer. The model is not limited to what it learned during training; it can receive context from company documents, help centers, product data, or contract archives.
In a typical flow, documents are split into chunks, converted into embedding vectors, and stored in a vector database. When a user asks a question, the system searches for similar chunks, optionally reranks them, and sends the selected context to the model. The model then generates an answer grounded in that context.
Why It Is Used
RAG is useful when answers depend on private or current information. It can work over internal procedures, product manuals, regulatory notes, support tickets, and technical documentation. It also provides a more auditable foundation for source citation and access control.
What to Watch
Poor document parsing, stale information, weak retrieval, or incorrect permissions can make RAG output unreliable. Chunking strategy, metadata, update pipelines, user authorization, and evaluation sets should therefore be part of the design.
A vector database provides the retrieval layer, while an LLM provides the generation layer.
Related Terms
An AI agent is a software component that uses an LLM, tools, and data sources to plan steps and complete a defined goal.
ChunkingChunking splits long text into meaningful, manageable passages that search and RAG systems can retrieve accurately.
Context WindowA context window is the total token capacity a language model can read and consider while generating one response in a single request.
Hallucination (AI)AI hallucination is when a model produces information that sounds plausible but is false, unsupported, or not grounded in the source.
Knowledge GraphA knowledge graph models entities such as people, products, documents, and processes with relationships that systems can query.
LLM (Large Language Model)An LLM is a model trained on large text datasets that can understand and generate natural language, forming the basis of tools like ChatGPT.
Prompt EngineeringPrompt engineering designs instructions, context, examples, and constraints so language models produce more useful, consistent, and reviewable output.
RerankingReranking re-scores an initial result set with a stronger model so the most relevant documents move to the top reliably.
Semantic SearchSemantic search finds relevant results by comparing the meaning of queries and content, not only matching exact keywords.
Vector DatabaseA vector database stores embeddings and retrieves records by semantic similarity, making it a core layer in AI search systems.