From the course: Hands-On AI: RAG using LlamaIndex

Re-ranking - LlamaIndex Tutorial

From the course: Hands-On AI: RAG using LlamaIndex

Re-ranking

- [Instructor] Before jumping into re-ranking, I want to quickly recap the order of operations within LlamaIndex. First is the data ingestion where we're pulling data from various sources. It could be from API, PDF, SQL database, text files, so on and so forth. Next is the indexing of the data. So we're ingesting data so that we can easily pass it to a language model. This is followed by retrieval. We're retrieving information based on the question or the prompt, which is the first step in the retrieval augmented generation process. Then comes re-ranking. So we're reordering the retrieved documents for relevance, which is then followed by post-processing, where we're applying transformations or filters to refine the nodes before the final response. And then finally, response generation. So this is important to keep in mind when you are building out your query engines, that it's first re-ranking and then post-processing. What is re-ranking? Well, re-ranking is just reordering the retrieved nodes based on some criteria. This criteria could be relevance, time-based, or some other factor. And what we're trying to do is bring the most relevant nodes to the top. This is different from post-processing because post-processing happens after ranking. And what we're doing is we're transforming and filtering the nodes for more further refinement. The main purpose of re-ranking is to reorder the nodes, whereas again, post-processing is to transform or filter the nodes. The most popular re-ranking technique is Cohere Rerank, but there are other re-ranking methods, and I'll talk about them a little bit at the end of this lesson. And yes, you guessed it, they all have the same common API, which, as you've heard me say before, is the beautiful thing about working with these orchestration frameworks. The abstractions all have a common API. So let's go ahead and talk about Cohere Rerank. When it comes to re-ranking documents to be sent to the LLM, there's actually specialized models that do this. And so the Cohere Rerank model is a model that is used for re-ranking, and it's by far the most popular method in LlamaIndex and the LlamaIndex ecosystem and most popular method that practitioners use. There's some arguments that you need to keep in mind if you want to use Cohere Rerank. One is top_n, which is just the top number of nodes to return. This defaults to two. There's also model, which is the name of the Cohere model. This defaults to rerank-english-v2, but you can use the newer one, which is rerank-english-v3. And, of course, you need to pass your API key. Under the hood, what's happening is we're extracting text from each node. Then we're using Cohere's API for re-ranking based on the relevance to the query. The API will then return a re-ranked results with the relevance scores. And so we get a list of NodeWithScore objects from the re-rank results. And this is what is sent to the LLM for response synthesis. And, of course, we use it in the same way as we use most of the other post processors. We import it and instantiate it with whatever arguments we need to. Once we have instantiated the re-ranker, we, of course, pass it as an argument to our query engine to the post processor argument. We'll go ahead and build a chain and build the pipeline. Of course, I'll show you exactly how it works using the query engine directly. You see here the response is the same as we're used to seeing, and we have the source nodes here, which are NodeWithScore objects. And, of course, we can go ahead and just use it as part of our query pipeline. There are two other alternatives to Cohere Rerank. One of them is the Colbert Rerank. The other is the Flag Embeddings Rerank. Both have the same usage pattern as I've shown you above. If you're interested in the nitty gritty details, I've linked to the documentation here that will talk about that for each one of those. In the next few videos, we are going to talk about some other techniques for Advanced RAG, so I'll see you there.

Contents