Getting Started with RAG: What Is Retrieval-Augmented Generation and How Does It Solve LLM Hallucination?

RAG solves LLM hallucination by retrieving external knowledge before generating responses
This course covers the fundamentals of RAG (Retrieval-Augmented Generation). LLMs suffer from hallucination due to their probability-based generation mechanism. RAG addresses this through a three-stage workflow — indexing, retrieval, and generation — transforming the model from "answering from memory" to "answering after consulting references," significantly improving accuracy. The course roadmap covers RAG basics, LangChain framework, Knowledge Graph RAG, and hands-on project implementation.
Course Overview
This is an AI full-stack tutorial series from Bilibili's Turing Python channel. This lesson, taught by instructor Baichuan, covers the fundamentals of RAG (Retrieval-Augmented Generation) technology. The course is positioned as the second phase of LLM application development, with an overall plan of 12-15 lessons covering a complete learning path from RAG basics to Knowledge Graph RAG and full project implementation.

The Core Pain Point of LLMs: Hallucination
What Is LLM Hallucination?
Large language models are built on the Transformer architecture and fundamentally generate responses based on probability. Transformer is a deep learning architecture proposed by Google's team in the 2017 paper Attention Is All You Need, which captures relationships between elements in a sequence through the Self-Attention mechanism. Current mainstream large language models (such as the GPT series, LLaMA, Qwen, etc.) are all built on the decoder portion of Transformer. Their text generation method is autoregressive — each time the model predicts the next token, it calculates the probability distribution across all candidate words in the vocabulary, then selects the output through a sampling strategy. This means the model is essentially an extremely complex conditional probability model that pursues linguistic fluency and coherence rather than factual correctness at the logical level.
Its goal is to make output "sound human" rather than guarantee the truthfulness of content. A classic example: when you ask an LLM "In which chapter does Lin Daiyu uproot a willow tree?" (a fabricated event mixing characters from different Chinese novels), the model won't correct this false premise. Instead, it will confidently fabricate a non-existent plot description. This happens precisely because the model found a generation path that "looks reasonable" at the probability level while being completely detached from facts.
Three Root Causes of Hallucination
- Training Data Noise: Training data contains a mix of false information, internet rumors, and even fan fiction. The model cannot distinguish truth from falsehood and learns incorrect information as fact. Take GPT-3 as an example — its training data includes hundreds of billions of tokens of internet text, which inevitably contains large amounts of low-quality or outright incorrect content.
- Probability-First Mechanism: This is an inherent characteristic of the Transformer architecture — the model tends to select the highest-probability output rather than the most factually accurate one. When the model faces a question it's "uncertain" about, it won't choose silence or express uncertainty. Instead, it generates an answer that is statistically most "reasonable."
- Data Staleness: LLM training data has a cutoff date and cannot access the latest information beyond that date. For example, a model with training data cut off at 2023 cannot answer questions about events in 2024, but it may "speculate" a seemingly reasonable yet actually incorrect answer based on existing knowledge.
How RAG Solves the Hallucination Problem
RAG (Retrieval-Augmented Generation) is a technical solution born specifically to address the pain points described above. This concept was formally introduced by Facebook AI Research (now Meta AI) in a 2020 paper. Its core idea is: before the LLM generates a response, first retrieve relevant information from an external knowledge base, then inject this accurate reference material into the model's context, allowing the model to organize its response based on reliable information, thereby significantly reducing the probability of hallucination.
The complete RAG workflow consists of three key stages:
- Indexing Stage: External documents are split into appropriately sized text chunks, then each chunk is converted into a high-dimensional vector through an Embedding model and stored in a vector database. Embedding is a technique that maps text into a continuous vector space, where semantically similar texts are closer together in vector space. Common Embedding models include OpenAI's text-embedding-ada-002, and open-source options like BGE and M3E. They typically convert text into floating-point arrays of 768 or 1536 dimensions.
- Retrieval Stage: The user's query is also converted into a vector, and the most semantically similar text chunks are found from the vector database using metrics like cosine similarity or Euclidean distance. Vector databases (such as Milvus, Pinecone, Chroma, FAISS, etc.) achieve millisecond-level massive vector retrieval through ANN (Approximate Nearest Neighbor) algorithms, serving as indispensable infrastructure in RAG systems.
- Generation Stage: The retrieved relevant text chunks serve as context and are combined with the user's question to form a Prompt, which is then fed into the LLM to generate the final response.
In simple terms, RAG transforms the LLM from "answering from memory" to "answering after consulting references," a shift that dramatically improves the accuracy and reliability of output content.
Complete RAG Learning Roadmap
This phase of the course covers the following modules:
- RAG Fundamentals: Understanding RAG's core concepts, practical applications, and complete workflow
- LangChain Framework: Mastering the most mainstream development framework for LLM application development. LangChain is an open-source framework created by Harrison Chase in 2022, designed to simplify the development of large language model applications. It provides a standardized abstraction layer that encapsulates common operations like Prompt management, model invocation, memory management, tool usage, and chain calls into composable modules. For RAG development, LangChain offers ready-to-use components including Document Loaders, Text Splitters, Vector Stores, and Retrievers, allowing developers to quickly build a complete RAG pipeline with minimal code.
- Advanced RAG Techniques: In-depth theoretical explanations paired with code implementation, including multi-path recall, Reranking, query rewriting, and other advanced retrieval strategies
- Knowledge Graph RAG: Currently recognized as the most effective RAG implementation approach. Traditional RAG retrieves based on semantic similarity of text chunks, but performs poorly on complex problems requiring multi-hop reasoning or entity relationship queries. Knowledge Graph RAG (Graph RAG) introduces structured knowledge graphs into the retrieval process, organizing knowledge through a network of triplets composed of Entities and Relations. When a user asks a question, the system can not only retrieve relevant text but also reason and traverse along relationship chains in the knowledge graph, obtaining more complete and precise contextual information. Microsoft's GraphRAG project released in 2024 is a representative work in this direction, significantly improving answer quality for global questions through automatic construction of community summaries and hierarchical graph structures.
- Application Platform Practice: Hands-on experience with low-code AI platforms like Dify. Dify is an open-source LLM application development platform that provides a visual workflow orchestration interface, enabling non-technical users to quickly build RAG applications.
- Project Implementation: Building a complete RAG application project from scratch
The core learning objective of this lesson is to master the RAG workflow, become familiar with tokenization, Embedding, and vector database usage, and implement a basic RAG system through a hands-on mini-project.
Summary
For those looking to get started with LLM application development, RAG is an essential core technology to master. It doesn't require retraining models (compared to fine-tuning approaches, RAG has lower implementation costs and faster iteration cycles). By simply retrieving from external knowledge bases, it can improve response accuracy and timeliness, making it one of the most practical technical solutions in enterprise-level AI applications today. Statistics show that over 80% of enterprise-level LLM applications adopt RAG architecture to ensure output quality. If you're learning AI programming or LLM development, it's recommended to start with systematic RAG study — it's the key bridge connecting LLM capabilities with real business requirements.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.