From the course: Hands-On AI: RAG using LlamaIndex
Query transformation
- [Instructor] We are going to close out our discussion of pre-retrieval techniques by talking about query transformation. Query transformation is a process in which a user's initial query is converted into a different format or broken down into subqueries. This transformation can be executed before the query is run against the database or an index. Hence, why it's called a pre-retrieval technique. The purpose of transforming a query is to enhance the effectiveness of what we retrieve from the database and make sure that the response is as accurate and relevant as possible. Also, it just could be the case that your users don't know what it is that they actually want to ask and might need some help from a language model. So let's go ahead and dig into it. So we'll start here by looking at some source code, right? If you're curious about what is going on under the hood, for anything in LlamaIndex, for any library you're working on in general, you should always be reading the source code. I can't stress that enough. Read the source code. That's where all truth is. So here, I'm essentially, I'm creating a prompt, and in this prompt, I'm saying, "Look, users aren't always the best at articulating what they're looking for," so on and so forth. And I'm telling the LLM to take this user's query and generate some number of questions. What I've done here is included few-shot examples in the prompt. You'll be surprised how far few-shot examples can go. This prompt right here is what the LLM is going to use as part of its context. So that when the user query comes in, we'll augment that query, get back some other queries. And this augmented query is what is going to be sent for retrieval. This is important. And as you can see here, if you look at the examples that I've laid out, what I'm hoping to do with query transformation is to help interpret the user's intent more accurately. If we're transforming a query into a form that aligns better with the underlying data structure or the capabilities of the search engine, it means that the system is going to understand what it's being asked. Also, some queries are really complex and might require information from different parts of a database or index. And so what a query transformation will do will allow these type of complex queries to be decomposed into simpler ones that could be individually processed and then integrated to form a comprehensive answer. Let's go ahead and see this in action. So here I'm passing in a QUERY_STRING. I have a function here that is just going to send a call to the language model and say, "Hey, take this string and generate some additional queries," right? And this is using the prompt template that I have above. So you can see here, the question I passed it is, how can I create my own luck? And you can see the generated queries that we get. A specific type of query transformation is the SubQuestionQueryEngine. As I mentioned before, some queries are complex, some queries need to be broken down, and that's what the SubQuestionQueryEngine does. It just breaks down these complex queries into simpler sub-questions. So we get a complex query, decompose a query into a bunch of sub-questions. Each one of those sub-questions is hopefully going to extract information from the database. Then, we're going to send these sub-questions to the database, get back the similar responses, and then synthesize that to get a response from the LLM. And we initialize that using the SubQuestionQueryEngine. So this is a type of query engine in LlamaIndex. So what we do is we build out this query_engine_tool object. I'm not going to get too much in detail for that. This query_engine_tool will then take the user's initial query, and then with that, create a bunch of sub-questions. So we initialize the query_engine_tools, then we initialize the sub_question_query_engine, like so. Then, we can go ahead and change the prompt. In this case, I'm using our standard HYPE_ANSER_GEN_PROMPT. We can see here this is the actual prompt that gets sent to the language model. It's saying, "You are an agent, you got multiple tools," so on and so forth. And then it gives it the user query. And then we package that as well with the hype query as well. And all this gets sent to the language model. And so here, we send the sub_question_query_engine.query, "How could I build my own luck, what are the types of luck I should pursue." So you can see, here's like a bunch of different questions that I'm asking. The sub_question_query_engine breaks this down into many, many smaller pieces. The result here is empty because I took a very small subset of the documents that we're working with. I just took two, for example, two questions from each one of the authors. You can expand that and you'll probably get an actual response. The point is you see what's happening here. I give the sub_question_query_engine this long query, it breaks it down to smaller ones. Those smaller ones each get sent, if you were, to the vector database to retrieve the relevant nodes. Another type of query engine is the hypothetical document embeddings. This creates a hypothetical answer or document based on the query, which is then used to retrieve information from the database. So it's an advanced approach that actually works really well and it leads to better embedding quality and more relevant search results. So again, this technique, what we're doing here, we're generating a hypothetical document that is based on the user's query. This is going to be used to improve the embedding and retrieval process. And this is really useful for enhancing the quality of responses in cases where maybe the direct query might be too vague or broad. So how does it work? Just reiterate, we feed a query to the language model with the instruction to write a document that answers the question. This is going to generate a hypothetical document that's going to capture the essence of the document, that is. We're going to generate an embedding vector for this fake document. We're not generating any actual text content for the document, but this embedding is going to be used to reserve some space in the vector store index. There's not actually going to be a full hypothetical document that we can access later. But this vector, we can use to search against the corpus embeddings, and the most similar real documents will be retrieved. So the idea is that a hypothetical answer to a question is going to be more semantically similar to the real answer than a question is. So in practice, this means that your search would use a LLM, like GPT, to generate a hypothetical answer, then embed that and use it for search. So to use HyDE, we use a query transform. So the HyDEQueryTransform. And the HyDEQueryTransform, we are going to wrap that into the TransformQueryEngine. So we instantiate the HyDEQueryTransform, we instantiate a hyde_query_engine, and then you can see the prompt that we get. And now, we take that prompt. And you can see here, we take our QUERY_STRING. And here, we have a hypothetical answer. And this hypothetical answer is what will help us find relevant context in our vector database. So we can go ahead and now create a hyde_query_pipeline the same way that we normally have, and then run that pipeline. If you want to learn more about hypothetical document embeddings, check out the paper that introduced it. It's called "Precise Zero-Shot Dense Retrieval without Relevance Labels." And you can see here, they have this illustration of the HyDE model, and it gives a lot of detail for how it works. So I'm not going to cover all of the mathematics behind it. But if you're interested, they've got a ton of mathematics in there. So that ends our discussion for query transformation. It is a huge aspect of modern RAG systems. We can use this to have more accurate, relevant, and speedy responses, and it really plays a role in the effectiveness of a good RAG system. I think it's one of those techniques that is high-leverage and can really help improve your RAG pipelines. I will now see you in the next module, where we're going to continue talking about advanced RAG, focusing more on post-retrieval and other techniques.