From the course: Hands-On AI: RAG using LlamaIndex
Ensemble retrieval
- [Instructor] As you've seen throughout this course, when you're building a RAG pipeline, you can use multiple retrieval strategies. You can use 'em simultaneously, individually, in combination with one another. There's so many different ways that you can use a retriever. But what if you can simultaneously try multiple strategies and then kind of prune the results? And so for this, we have the ensemble retriever. So the ensemble retriever will use different retrieval strategies, for example, different chunk sizes, vectors, keyword, hybrid searches, whatever you define together. It will combine the results from the different strategies to improve the quality of retrieval. And then you can add a reranker to the mix if you'd like. This is useful to compare and evaluate the effectiveness of different retrieval strategies against each other. Let's go ahead and see this in action. So all this is stuff that we've seen before. What we're going to do here now is we're going to create a few different vector indices. We're going to create vector indices that have different chunk sizes. So we'll split the nodes in chunks of 128 through 1024, as shown here in the chunk sizes list. And then what we're going to do is create an ensemble retriever from that. So we'll start by defining the index nodes, which we'll create a separate index node for each retrieval strategy. We'll then create a summary index, set up a recursive retriever, define a reranker, put all this together into the query engine, and then run the queries. The first thing we need to do is define the index nodes. So this code right here, we'll create a separate index node for the vector retriever that corresponds to that chunk size. So for example, we'll have a different retriever for each one of those chunk sizes that we defined above. We're going to aggregate all of this into a summary index. So a summary index is just a list-based data structure. And so the way it works is that during index construction, we're going to chunk the text up. Then each chunk will be converted to a node. Then these nodes are going to be put into a list. Then at query time, the initial answer is going to be constructed using the first text. Then we're going to refine through feeding in subsequent text as context. And refinement could mean keeping the original answer, making small edits, or just rewriting the original answer completely. We'll put all this together into a recursive retriever. The recursive retriever is going to fetch all nodes from the summary index, and then recursively call the vector retriever for each chunk size. So we'll define all that here. I have a summary_index and the RecursiveRetriever. We'll go ahead and pull some nodes in that are associated with this query. And you can see the number of nodes that were pulled and what the nodes actually were. Next, we can rerank the final results using the reranker. We'll use CohereRerank here. We can put everything together into a retriever query engine. So we have our retriever, we used the reranker. And again, remember the retriever is defined right here. Now that we got our query engine all packaged together, we can execute a query against it and get back the response as well as the source nodes. And so here you can see exactly the response and then all of the nodes that contributed to that response. We can also analyze the relative importance of each chunk. So ensemble-based retrieval has a cool feature, and that is reranking, which allows you to assess the importance of each chunk based on the order in the final retrieve set. So if a certain chunk is consistently ranked at the top, that means it's likely more relevant to the query. And we can define a function that helps with that. And what I'm going to do is define a mrr_all function. This is going to evaluate the relative importance or relative difference of each chunk by analyzing the ranks in the list. So a high MRR means that the metadata tends to appear earlier in the ranking, which means it's higher relevance or importance. So we have some input parameters here, metadata value, metadata key, and the source nodes. The source nodes is just a ranked list of nodes. What this code is going to do is for each metadata value, iterate through the ranked list of source nodes, identify the position of the first occurrence of the metadata value in the list, compute the reciprocal rank, and then store the reciprocal rank in a dictionary. The output is going to convert the dictionary of MRR values into a Pandas data frame, and then we will display the Pandas data frame. And here we'll go ahead and run that. And you can see here that the chunk size of 128 got a MRR of one, indicating that it had the highest-ranked results. So you can go ahead, and with some coding and engineering effort, you can go all the way back up here and you can hack around with this yourself, right? You can define this bit of code however you want. You can use whatever type of retriever you want to do. So you've seen the patterns play out several times throughout this course, so you should be able to take this code here and test out different strategies for retrieval. I'll see you in the next and final video for the Modular RAG section where we talk about the ensemble query engine.