From the course: Hands-On AI: Building LLM-Powered Apps
Retrieval augmented generation - Python Tutorial
From the course: Hands-On AI: Building LLM-Powered Apps
Retrieval augmented generation
- In the previous chapter, we built a simplified Chat GPT application using Chen and Chain Lit.* In this chapter, we will try to bring knowledge into our chat with PTF application via PTF document. We mentioned that in the previous video, a large language model tends to hallucinate, and we can fix that by putting information in the input context, but the contact then is not infinite to fit all of the information out there. And the solution to this problem is to augment the large language models with relevant knowledges, with regards to the question. This architecture pattern is called Retrieval Augmented Generation or RAG. What Retrieval Augmented Generation does is that it separates our application into two portions. On one hand, we have the large language model, and on the other, we have a search engine. So on the large language model side, it is responsible for generating and reasoning the answers. On the other side, we rely on the search engine to surface the most relevant documents for us to send in to the contacts for the large language model. So when a user ask a question to our chat with PDF application or application will first pass the question to the search engine, then the search engine retrieves the most relevant documents and send those relevant documents back to the application. Then our application includes those relevant documents inside the prompt to the large language model. Then the large language model respond to user's question with the relevant information supported by sources from our search engine. And this completes a retrieval augmented generation process. In summary, the RAG architecture uses the large language model to conduct reasoning and generation and get the factual context from the search engines. The RAG architecture is a very, very good concept, but it also has some limitations where there is no guarantee that the generated sentences will be supported by the citations or does it guarantee all retrieved citations can be used and will be used in the generation process. In summary, we enhance the capabilities of our application by using a Retrieval Augmented Generation architecture or RAG architecture. It first retrieve the relevant documents, provide those documents to the large language model. And when we ask the model to answer the question using the context and context only, this provides grounding to the model's answers. Now, since we brought up a search engine, we will go into a brief introduction on what is a search engine.
Contents
-
-
-
-
Retrieval augmented generation3m 30s
-
Search engine basics2m 32s
-
Embedding search3m
-
Embedding model limitations3m 15s
-
Challenge: Enabling load PDF to Chainlit app48s
-
Solution: Enabling load PDF to Chainlit app5m 4s
-
Challenge: Indexing documents into a vector database1m 50s
-
Solution: Indexing documents into a vector database1m 43s
-
Challenge: Putting it all together1m 10s
-
Solution: Putting it all together3m 17s
-
Trying out your chat with the PDF app2m 15s
-
-
-