From the course: Hands-On AI: RAG using LlamaIndex

Node post-processing - LlamaIndex Tutorial

From the course: Hands-On AI: RAG using LlamaIndex

Node post-processing

- [Instructor] We're going to continue on learning about advanced RAG techniques, and in this chapter of the course we're going to talk about post-retrieval and other techniques. I'm going to kick our discussion off by talking about node postprocessing. So note in this lesson and the lessons going forward, I'm not going to go through and run all the code for you. I'm going to let you go ahead and piece the patterns together yourself. That being said, let's go ahead and jump right into it, I want to talk to you about node postprocessing. A note that all this stuff right here, I've set it up for you, so all the quote-unquote boilerplate has been set up for you, so that if you do want to play around with these postprocessors, I'll leave that right up to you. So what is a node postprocessor? So a postprocessor applies some type of additional processing or filtering to a list of nodes that's returned by a query, and then returns a final result. And these are modules that take in a set of nodes, apply a transformation or filter, and then return them. That's basically all they do. But there are so many different postprocessors in LlamaIndex, and I've linked to the source code here where you can take a look here and just see all the different types of postprocessors available to you: Keyword, similarity, previous next node, auto previous next node, the list goes on. There's a lot of postprocessors. If you look into the source code, LlamaIndex core, and then postprocessors, you'll see all the types of postprocessors available to you. What I want to talk to you about specifically are the similarity postprocessors, keyword, metadata replacement, long context reordering, and sentence embedding. We've actually already seen the metadata replacement postprocessor in action in a previous lesson, but I'm going to go ahead and just cover it again here. So let's talk about the basic usage pattern for node postprocessing. The wonderful thing about these type of orchestration frameworks is that it exposes a common API that you can use for a wide variety of the abstractions. And that's amazing because you keep the patterns in your head, you don't have to kind of context switch between the different types of postprocessors or the different types of abstractions, it's all a similar high level API. So the basic usage pattern for a node postprocessor is as follows. So you use the node postprocessor as part of a query engine, and this is where the transformation is going to be applied to the nodes that are returned from a retriever before the response synthesis step. Now keep in mind that you don't use a node postprocessor directly in the as_retriever method of the index, and I'll talk about that at the end of this lesson, but the basic usage pattern is like this. Of course, you import your VectorStoreIndex, you import your postprocessor of choice, you instantiate your postprocessor with whatever arguments that you want, and you pass this as an argument to the query engine. So index.as_query_engine, you can pass a node postprocessor to the node postprocessor argument. Now recall that we have a abstraction built on top of that called create_query_engine, that is in the helper functions. Here we're using the query mode, and you can just pass it when you construct your query engine as a keyword argument using the query mode, and you'll have your postprocessor set up. So you've seen the pattern I think at least a dozen times, so I hope you picked up on it and can apply it yourself. So I'm going to describe the postprocessors and how to instantiate them, but you can apply them to your query engine and use them on your own. After all, I do believe in you, and also I believe in my ability to have taught you well. So I think you are well capable of doing that by this point. So let's go ahead and kick off our discussion by talking about the similarity postprocessor. This is great, it does exactly what it says it does. (Instructor chuckling) We filter a list of nodes based on the similarity score, so you can provide a similarity cutoff to control this threshold. And what's going to happen is that, nodes that have a similarity score to the query that are above the cutoff are going to be included. That's all there is to it. So the argument you need to really know about is just the similarity cutoff argument. So it's the minimum similarity score required for a node to be included in the output. So under the hood, what's happening? So we first check if the similarity cutoff is set, the default value is none. We're then going to iterate over each node in the input list, and then for each one of the nodes, we check if the node similarity score is above the threshold. If a similarity score is none or below the cutoff, then the node will not be included in the output. If the similarity score is above the cutoff or if the similarity cutoff is not set, then the node is included in the output. So the method is going to return a filtered list of node with score objects. So that's how that works. And to instantiate that, it looks like this. So from llama_index.core.postprocessor, we get a similarity postprocessor, and then we can instantiate it like so. The next node postprocessor I want to talk to you about is the keyword node postprocessor. This will filter nodes based on the presence or absence of specific keywords. So you provide a list of required keywords or a list of keywords to exclude. And nodes that contain all the required keywords, and none of the excluded keywords are going to be included in the output list. And so there is a few arguments you need to know. The two most important ones are the required keywords and the exclude keywords. Lang is also an argument, it's just the language of the text and nodes. This defaults to English, it's not a required argument, but just to know that it's there. Under the hood, what's happening is that we're iterating over each node with score object in the input list. Then for each node we're retrieving the nodes content and processing it using a language model from the spacy library. So if you look at the source code for the keyword node postprocessor, you'll see that it makes use of spacy's PhraseMatcher. And this is what's being used under the hood to do this inclusion or exclusion. If the required keywords are provided and the nodes content does not match any of the required keywords, we skip that node. If the exclude keywords are provided and the nodes content matches the exclude keywords, we skip that node. And then if a node is passing both the required and exclude keyword checks, it's going to be included in our output list. We get back out a node with score object, and we pass that to our language model. So see this in action. So I know I said that I wasn't going to show you it in action, but I'll show you so that you can build on this. So let's go ahead and instantiate a keyword node postprocessor. You see we have the required keywords, the exclude keywords, we instantiate the keyword postprocessor, then I'm going to go ahead and instantiate a query engine. So we've seen this before, we're using our query engine abstraction, and I'm passing the postprocessor here, also making sure that I'm returning source nodes. We'll go ahead and construct a query chain and a query pipeline, but first I want to go ahead and just show you how this thing works if we just use the query engine because I want you to see what these node with score objects look like. So let's go ahead and see that. All right, so we see that, we get these node with score objects. So we can go ahead and take a look at just one of them. And you can see we have a text node, it's got the similarity score, and then you can see the content of this source node, if you'd like. There you have it. Of course, you can also just use this in the pipeline like we have seen numerous times by this point. And we'll go ahead and get a response. There we go. Next I'm going to talk about the metadata replacement postprocessor. We actually saw this in action previously, but I'll speak more about it in detail here. So remember that this allows you to replace the content of each node in the input list with a specific metadata key value instead of just the original content. As I discussed previously, this is most useful when it's combined with the SentenceWindowNodeParser. So under the hood, what's happening is we're iterating over each one of the node with score objects. For each node, we're going to retrieve the value of the specified target metadata key. If that metadata key exists, its value is going to be used to replace the nodes content. If the metadata key does not exist, the nodes content is going to be unchanged. Like all of the node postprocessors, this will return a NodeWithScore object. Instantiating this is straightforward, it's the same abstraction for all the other node postprocessors. We import it, instantiate it, and pass in the required arguments. Next is the LongContextReorder node postprocessor. So there's a study that came out here, "Lost in the Middle: How Language Models Use Long Contexts." This talked about this idea of lost in the middle, or kind of like try to find a needle in a haystack. It turns out the language models struggle to get the relevant to details, if they're located in the middle of a really long context. So what they found was that the best performance is achieved when the important information is either at the beginning or end of the input, even if that model is specifically designed to have a really long context. And so this is what the long context reordering postprocessor does. We're going to reorder the node based on their relevance scores, and this is helpful where there's a large top-k that you need to grab. So we're going to place the nodes with the highest scores at the beginning and end of the list, so that we're improving the model's ability to access important information. You actually don't need to pass any arguments for this postprocessor, but let's just talk about what's going on under the hood. So what it's doing is, we're sorting the input nodes based on the relevant scores in descending order, so we have the highest first. If a node score is none, we're just going to treat it as a zero, then we're going to iterate over the sorted list, we'll check if the index is even or odd. If the index is even, we're going to insert it at the beginning of the reordered_nodes list using a insert method with an index of zero. That way the nodes with an even index are placed at the beginning of the list. If the index is odd, the node is appended to the reordered_nodes list. Finally, we're going to get back a reordered_nodes list that has the nodes in the new order. If you're interested, of course, I've linked to the source code. You should always read the source code if you are not sure what is happening under the hood, and you can see exactly what is happening here. As you can see, it is not that long or hard of a function to reason through, but it is powerful. It's a powerful technique. Of course, you will instantiate this the same way. Beautiful thing about these orchestration frameworks, the APIs are pretty much the same for all the abstractions, so we import the postprocessor and then instantiate it. Next I'm going to talk about the SentenceEmbeddingOptimizer. So this improves the text content of the node based on the relevance to the query. So we're using an embedding-based similarity score to select the most relevant sentences and shorten the input text. So what the optimizer is doing is removing sentences that are not related to the query using embeddings. We have a percentile cutoff to determine the top percentage of the relevant sentences, but alternatively, the threshold cutoff can be specified to select which sentences to keep based on a raw similarity cutoff. And I've linked here to the source code that you can take a look at and read through if you want to see what's going on under the hood, but I'll briefly talk about the arguments you need to pass and kind of how it works. So you need to pass it in embedding model. As with everything in LlamaIndex, it defaults to OpenAI. You pass a percentile cutoff for selecting the top sentences, so the number of sentences is calculated like so. So you take an integer value of the total number of sentences times the percentile cutoff. So if the percentile cutoff is set to 0.5, it means that the top 50% of sentences with the highest similarity score are going to be selected. If you want to select a fixed percentage of the most relevant sentences, then you can use this percentile cutoff. There's also the threshold cutoff, and this is a cutoff that is based on the similarity score. So if the threshold cutoff is set to 0.7, then only sentences with a similarity score greater than 0.7 will be selected. So this is what you would use if you want to select sentences that meet a minimum similarity score. These two can be combined. In fact, you can combine percentile cutoff and the threshold cutoff. There's a tokenizer function argument that's going to split the text into sentences. This uses the NLTK English tokenizer. We've seen this before, the punkt sentence tokenizer from NLTK. There's a couple of context_before and context_after arguments. This is just the number of sentences to include before or after the relevant sentences. Under the hood, we're retrieving the text content of each node, we'll split that into sentences using the tokenizer, we'll generate embeddings for the query and the sentences, calculate the similarity score between the query and the sentence, select the top sentences based on the percentile cutoff or the threshold cutoff, then we'll retrieve the context sentences before and after the selected sentence, join those together, and we'll have this now kind of optimized context, and we'll pass this to the language model to produce a response. And of course, to instantiate this, it's the same as what we've seen so far. So we import the SentenceEmbeddingOptimizer, and then instantiate it with the arguments that you want to give it. Note that we have seen a lot of node postprocessors, but you can't pass it directly to the as_retriever method of a index. So node postprocessors are configured and used within the context of a query engine rather than directly with the retriever. So the retriever is responsible for fetching the most relevant nodes based on the query. And so the node postprocessors are going to be applied with the query engine to then refine, filter, or augment the nodes before final response is synthesized. So you've seen in previous videos that we had to sometimes instantiate a retriever object, and then pass that retriever object to the query engine. It's the same thing that you would do here, right? So let's say, we instantiate a retriever, and just make it like a vector index retriever. We'll instantiate our postprocessor, and then when we create our query engine, we'd have to create it manually. So we create the retriever query engine, or whatever query engine it is that you're using, pass it through retriever, pass it through postprocessor, and then we can do our query. So there you have it, you have another tool in your RAG tool belt. Node postprocessing is a important technique to use. It'll definitely improve the results of your RAG pipeline because at the end of the day what you're doing is you're making sure that you're giving the best context to a language model. I'll see you in the next video where we're going to talk a little bit about re-ranking, another postprocessing technique.

Contents