Skip to main content

Lesson 1

What is RAG

Before we understand RAG, let’s look at the limitation of a large language model.

Limitation of a large language model

Limitation of a large language model

Limitation 1: Limited Knowledge

Imagine you ask Cluade - When is the next Brazil match?

 Limitation 1: Limited Knowledge

It cannot answer the above question without web search. An LLM only knows what it learned during training.

If something happened after its training cutoff, it has no idea unless it's connected to external information.

LLM Cutoff date

For example:

  • Latest news
  • Stock prices
  • Today's weather
  • New product releases

Without retrieval, the model cannot reliably answer these questions.

Limitation 2: No Knowledge of Your Private Data

Let’s say you ask Claude - What is the HR policy of Meta? It can’t answer it - because the model has never seen Meta’s private documents.

 Limitation 2: No Knowledge of Your Private Data

Limitation 3: Hallucinations

Sometimes LLMs sound extremely confident...

...while being completely wrong.

This phenomenon is called a hallucination.

Because the model doesn't know the answer, it tries to generate something that looks correct.

With RAG, the model is grounded in real documents, making hallucinations much less likely.

Limitation 4: Expensive Retraining

Suppose your company's policies change every week.

Should you retrain a billion-parameter model every week?

Of course not.

Training or fine-tuning large models is expensive and time-consuming.

Instead, simply update your documents.

RAG automatically retrieves the latest version whenever someone asks a question.

What is RAG?

RAG stands for Retrieval-Augmented Generation.

In simple words, RAG is a technique that allows an AI model to retrieve relevant information from an external knowledge source before generating an answer.

Instead of relying only on what it learned during training, the model first searches for relevant documents, then uses those documents as additional context while generating its response.

Think of it like an open-book exam.

A normal LLM answers questions only from memory.

A RAG-powered LLM is allowed to quickly open a textbook, find the relevant pages, and then answer your question based on that information.

How does Retrieval-Augmented Generation work?

 How does Retrieval-Augmented Generation work

So the process becomes:

  1. User asks a question.
  2. Retrieve relevant information.
  3. Provide that information to the LLM.
  4. The LLM generates an accurate, context-aware answer.

That's why it's called Retrieval-Augmented Generation—we augment the generation process with retrieved knowledge.

Advantages of RAG

Let's summarize why RAG has become one of the most important techniques in Generative AI.

 Advantages of RAG

1. Access to Up-to-Date Information

RAG can retrieve the latest information from external sources.

This means your AI can answer questions about today's events, recent product launches, or newly published documents.

2. Works with Private Knowledge

RAG allows your AI to answer questions using:

  • Company documents
  • PDFs
  • Confluence
  • SharePoint
  • Databases
  • Knowledge bases

Without retraining the model.

3. Reduces Hallucinations

Since the model answers based on retrieved documents instead of pure memory, responses become much more accurate and trustworthy.

4. Easy to Update

Need to update your knowledge?

Simply add or modify the documents.

There's no need to retrain the LLM.

5. Cost Effective

Training or fine-tuning large language models can cost thousands or even millions of dollars.

RAG avoids this cost by keeping the model fixed and updating only the external knowledge base.

6. Builds user trust

One of the biggest advantages of RAG is transparency.

You can even show users the documents or passages that were used to generate the answer.

This builds trust because users can verify the information themselves.

Conclusion

To summarize:

A traditional LLM answers from memory.

A RAG system first retrieves relevant information and then generates the answer using that information.