From the course: Hands-On AI: RAG using LlamaIndex

Query pipeline - LlamaIndex Tutorial

From the course: Hands-On AI: RAG using LlamaIndex

Query pipeline

- [Instructor] In this video, we are going to talk about the Query Pipeline. So the Query Pipeline in LlamaIndex is a declarative API that allows you to chain together different modules so that you can orchestrate workflows, whether they be simple workflows or advance and complicated workflows. This is all centered around this abstraction called the Query Pipeline. And the query Pipeline lets you load modules such as LLMs prompts, retrievers, re-rankers, or even other pipelines, connect them together in a sequential chain. Or if you want, you can connect them as a directed acyclic graph or DAG for short and run it end to end. This is nice because it's going to allow you to express common workflows with fewer lines of code, less boilerplate code. It'll increase the readability of your code. And in the future, you'll be able to serialize pipeline components so that there's more portability and easier deployment to different systems. Query engine is a super powerful abstraction in LlamaIndex. I encourage you to learn more about it by visiting the documentation. If you go to the LlamaIndex, Component Guides, you can look in Query Pipelines and then you'll get some thorough documentation about what the Query Pipeline is, how it works. There's also a great introduction to Query Pipelines. This is going to be under examples, Query Pipelines. You'll see examples for building an agent using a Query Pipeline, and you can also see how to use Query Pipelines with different types of workflows. In this course, we're going to focus on using the Query Pipelines for RAG applications to simple, straightforward RAG applications. And let's go ahead and see how that works in action. Now we'll go ahead and instantiate the LLM embedding model, instantiate our quadrant vector store and build an index over it. And now, what we're going to do is create a very simple RAG pipeline. Well, it's simple, but it is slightly complex. And in this workflow, what's going to happen is we're going to have a input that's going to be passed through two prompts, and we're going to have a retrieval module sitting in the middle. And we are going to finally have all that synthesized by a large language model. So what I'm doing here is essentially going to build a chain. I'll start by describing the chain and then what the different components are. In this chain, I'm going to have a prompt template. I'll have that prompt template passed to the retriever. What I retrieve will be passed to another prompt template, which will then be sent to the LLM for generating a response. And the prompt templates are like this. The first prompt template is just saying retrieve context about the following topic and topic is what the user will pass in. We'll format that as a prompt template. Then the second prompt template we're creating is just saying synthesize the context that's been provided to you using modern slang, yet still quote the sources. And we'll build a prompt template from that. And then we're just going to use a retriever over our index and just fetch the five most similar chunks of text to the user query. So we'll go ahead and run that cell and we can run this and we'll get a response. And you can see we set verbose equal to true. So we see the intermediate steps happening in action as well. And right here, we have the response. And you can see, you know, talking in modern slang, There's this dude who's like, "Yo, we got to climb this mountain, but we ain't got no fancy gear," so on and so forth. And it's actually quoting from the "Book of Poems." So Edgar A.Guest, I can't confirm, Edgar A. Guest has poems in the "Book of Poems." And if you wanted to, you can check this out as well. Now if you want or two, you can look at some of the intermediate steps that are happening under the hood by running with intermediates. So we usually run with intermediates method of the pipeline. And let's go ahead and I've already run that here and we can see what we have as the output. And the output you can see here is, you know, what we saw above. We can view this as a dictionary if you'd like, by hitting it with double __dict. And you can see what's happening under the hood there. And intermediates, you can see here are dictionaries that have the intermediate responses and nodes that are fetched from the language model. So if you wanted to slice into one just to kind of see what it looked like, you could do that as well. All right, so let's go ahead and build out another rag pipeline. Here, what we're going to do is build out a, there's the standard straightforward kind of RAG pipeline without the query rewriting step that we saw above. But what we're going to do is define the links more explicitly. So we start by defining the input component because we need a way to link the input query to the retriever and to the tree summarizer. And so what we do is define that with a input component here we're going to have a retriever over our index. I'm going to use a different LLM this time using GPT-4.O, and I'll use a tree summarizer. If you're not familiar with what a tree summarizer is, don't worry about it too much. We'll come across it a bit later in the course as well. So we'll run that. And what we're going to do is explicitly define the Query Pipeline. It's what we're doing here. Then we're going to add modules to the Query Pipeline. And we do that by passing in a dictionary with key value pairs being, you know, kind of the the name and then the actual component itself. So here, I'm just adding the input retriever tree summarizer modules, and now I'm adding links between them. So I'm linking the input to the retriever, and then I'm linking the input to the tree summarizer. And then I'm linking the retriever to the tree summarizer. And we have these destination keys that are telling the Query Pipeline, which variable in the prompt template to inject the context into. We can go ahead and run that, and we've built our Query Pipeline. We'll use the same query as we did before, and we'll get the response as well. And you can also view it a bit more granularly if you'd like. If and when you view it granularly, you'll see that you got nodes with scores, which will have the metadata as well as the actual text that was retrieved. Throughout the course, you'll see me use both of the patterns I've shown here where we're explicitly adding modules and defining links in the Query Pipeline and the pattern that we saw up here where we're just simply passing everything into the chain argument. Now I want to bring your attention to a helper file that we're going to use at the remainder of the course. So if you go to the top level of the repository, look for the helpers folder. In the helper's folder, you'll see a Python file called utils. And in utils, you'll see that I've defined a lot of the code that you've seen over and over in previous notebooks and kind of just created wrappers around the LlamaIndex functions and classes and just kind of making it a bit more flexible. So I've created a function here for setting up an LLM. If you want to set up a Mistral account and use Mistral, you could do that. I just added that there for you just in case. Same thing for the embedding model. You can go ahead and pass in the provider, and I've made it easy to set up three for you here. And then there's also functions that we're going to use here that are going to help set up our vector store, so that way, we're reducing a lot of the boilerplate code that we have in our notebooks. Some other helper functions here. One where we create the index. As I mentioned before, we can create an index directly over a list of notes, or we can create it from a vector store or from docs. So I just wrote a function that's going to help us with that, just to kind of make things a bit easier. I wrote a wrapper around an ingestion pipeline, and that's just going to help us to kind of just keep things neat and organized. And also there is a wrapper around the create Query Pipeline and notice that this wrapper function just using chain. So when the time comes for us to use the more explicit definition for a Query Pipeline, we're going to do that more manually. I've got functions here that will help with creating query engines with different modes. And also some other helper functions that we'll see later throughout the course just to kind of make life easier. And essentially, I'm just trying to keep the notebook free from as much code as possible. So you'll see these wrappers and abstractions that I've written throughout the course. Just note that they're simple wrappers around the preexisting LangChain abstractions, and you can find that in the top level of the repository under helpers. I'll see you in the next video where we're going to talk a little bit about prompt engineering for RAG, really just how to manipulate and use prompt templates in Query Pipelines. I'll see you in the next video.

Contents