Home/ IT/ Retrieval-Augmented Generation (RAG) Systems
IT · Seminar 01 · Grounding LLMs in your own data

Retrieval-Augmented Generation (RAG) Systems

RAG retrieves relevant documents from a knowledge base and supplies them to an LLM at query time, grounding answers in current, private data and reducing hallucination.

RAGLLMvector searchembeddingsgrounding

A large language model only knows what was in its training data, frozen at a cutoff, and it can confidently invent facts. Retrieval-Augmented Generation (RAG) fixes both problems: at query time it fetches relevant passages from an external knowledge base and puts them in the prompt, so the model answers from current, authoritative sources rather than memory alone.

Working principle

RAG has two phases. Indexing (offline): documents are split into chunks, each converted to a vector embedding and stored in a vector database. Retrieval + generation (online): the user's question is embedded, the most similar chunks are retrieved by nearest-neighbour search, and these are concatenated with the question into a prompt. The LLM then generates an answer grounded in the retrieved context, ideally with citations.

User query1Embed query2Vector search (top-k)3Augment prompt with context4LLM generates grounded answer5Online retrieval-and-generation flow
Figure 1. Retrieved passages are injected into the prompt before generation, anchoring the answer to real sources and enabling citations.
Table 1. Plain LLM vs. RAG vs. fine-tuning
ApproachFreshnessBest for
Plain LLMFrozen at cutoffGeneral reasoning
Fine-tuningBakes in at train timeStyle, format, skills
RAGLive — update the indexFacts, private/changing data
Key insightRAG quality is dominated by retrieval quality, not the LLM: poor chunking or embeddings mean the model never sees the right context. Hybrid (keyword + vector) search and re-ranking are common upgrades.

Applications

  • Enterprise Q&A over internal docs, wikis and tickets
  • Customer-support assistants citing the knowledge base
  • Research and legal/medical assistants needing verifiable sources

References & further reading

  1. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS 2020.
  2. Karpukhin et al., “Dense Passage Retrieval for Open-Domain QA,” EMNLP 2020.
  3. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” 2024.