From the course: Hands-On AI: RAG using LlamaIndex
Putting it all together
- [Instructor] It's now time to put everything we've learned together and build a Naive RAG pipeline. Now this is called a Naive RAG pipeline because it's like the most simple, most basic RAG pipeline that we can build. So let's go ahead and put all the pieces together and see this in action for an end-to-end pipeline. So we begin, as we normally do, just getting our imports set up, so let's go ahead and do that. We'll instantiate the code here, API key, OpenAI, get our Qdrant URL, Qdrant API key as well. And I'm going to name this collection as "words-of-the-senpai-naive." So that's what I'm going to call the collection, "Words of the Senpai," senpai meaning teacher, dash-Naive, because this is the Naive pipeline. We'll set up our LLM and for this lesson we're just going to use Cohere Command R-plus. We'll also set up our embedding model. I'm going to be using text-embedding-3-large. We'll go ahead and set up our vector_store. And so just to point out right, when we run setup LLM and set up embedding model, we're assigning to the global context, the settings global context value for LLM and the value for embedding model, right? So this means we don't always need to explicitly pass a LLM or embedding model when it is a required argument because we attach it to the global context. So let's go ahead and load our documents from the document store. And you can see again it's just a regular Llama Index document object. Now we are going to ingest this into our vector database. So we'll define some transforms here and we are going to use a sentence splitter with a default chunk size from Llama Index, and you'll see what the default chunk size is in a second. I'm going to talk at great length about different ways to chunk and split up your text in a future lesson. So don't worry too much about this right now. We're going to go ahead and ingest, and remember, this is all coming from the helper file that I've defined here. So these are just abstractions built on top of the ingestion pipeline and the query pipeline. So that's all that I'm doing here. Let's go ahead and ingest. This should take about a minute or so. All right, so we are back. That took just about 50 seconds and you can see the default chunk size is 1024. Let's go ahead and verify that this was in fact ingested to Qdrant. So we go to Qdrant, hit the hamburger menu there, go to our clusters, scroll to here, open the cluster dashboard, look at this "Words of Senpai Naive." It is actually here. Awesome. You can see that we have an embedding dimension of 3072, so that's awesome because we're using the text-embedding-3-large. If you want, you can even visualize some stuff. So remember we just go down to here, we can hit run and we'll see some visualization pop up in just a second, right? That's pretty interesting to see. We have like some really interesting discreet kind of islands of text. So that's, that's pretty cool to see. All right, so we are now going to build a index over the vector database. That way we can run queries against it. So just walking through the code here, of course, storage context, we'll set that even though it's not explicitly necessary in this case because we've already ingested everything. So I mean you could actually like just safely remove storage context and still build an index and you'll be okay. As a matter of fact, let me just, you know, do that for you right now, just to illustrate. Also you'll notice here that I commented this line out. This is just to illustrate that you know when we're building a query engine, right, we have our index and we use the .as_query_engine method. That actually takes as an argument LLM. So just look at the source code here. So as_query_engine, right? This is just the code for a base vector index, which you know, everything subclasses from. So you'll see here for as_query_engine, there's an argument here for LLM, it is an optional argument. This resolves to pulling it from the settings if the settings is available, right? So we don't need to pass it explicitly here, because you know, it's pulling it from the setting. So that is just to make it absolutely clear in case it wasn't clear in the previous lessons. So let's go ahead now, create the index, and create the query engine. Now we'll go ahead and build our query pipelines. So what do we need? We need an input component and that input component needs to be passed to the query engine. And notice here I'm not adding a LLM to this chain because the LLM is already implicitly inside of our query engine, right? So let's go ahead and build our query pipeline and we could do some queries. So here's the first query that we are going to run. And this might take a little bit of time, because you know, Cohere being a free API has rate limits. If we were to use this with OpenAI, it would be definitely quicker. Let's go ahead and let this run and we'll see what response we get. Alright, so it took about 20 seconds to run. So let's go ahead and look at this object here. All right, so we have a response object and we can just actually hit this with a print statement and we can see the response from the LLM. You can also look at the source nodes, right? If you go DIR on that, you'll see all the different attributes and methods that you can call. So let's do that. And you can see here we've got a couple of source nodes that were being pulled. So we can just look at one of the source nodes, hit it with a dunder dict, and there we got it, right? So we have the source nodes. I'll go ahead and let you run these other responses on your own to check it out, but there we have it. We have now built an end-to-end Naive RAG pipeline, right? I know you're thinking, oh, this all looks so familiar. Like I've done it like a hundred times already throughout this course. And to a certain extent, yeah, we have. We've seen this pattern several times over and over while we were exploring the core components of Llama Index and then those higher level abstractions. So you should now feel comfortable with the general patterns for how we're building a RAG pipeline. And now we can start tweaking stuff. We can start optimizing chunk size, we can do different retrieval techniques, we can talk about different ways of chunking. We can talk about attaching metadata, so on and so forth. So we call this naive because it's the most absolute basic thing that you could do. It only gets more interesting and more complex from here. And Naive RAG is great and you can see it, it works well, but it does have its limitations. And these limitations are what I'm going to discuss in the next video. So I'll see you there.