From the course: Introduction to AI-Native Vector Databases
Solution: Vector Search with Weaviate
From the course: Introduction to AI-Native Vector Databases
Solution: Vector Search with Weaviate
Now that you've taken the time to work on this challenge, allow me to demonstrate my solution. So, in this notebook, we've got our four questions at the top here. We're going to go in and load up our data set. So, for this data set, the URL is a bit different because this is the jeopardy-like 1000 data set. So, we've got 1000 question and answers in here, significantly larger than what we've been working with. But the process to load it in is quite similar to what we've been seeing. We're going to go in here and load up our data set, and we'll look at one of the questions in there. We now have the airdate for this particular question. We have the round it was used in the value of that question, and like before, we've got the category question and the answer. So, we're going to go ahead and start up with Weaviate as we've been doing. And then we're going to see if the question class exists. If it does, we're going to get rid of it because we want to create our own question class. In this solution, we're going to do things a little bit differently, because I want to give you a different flavor of how you can create different schemas. So, here, we're going to define a class. So, the class definition is going to start off, and we're going to specify here the class. So, this is similar to what we were doing before. We're going to have the question class. We're going to go in and specify the vectorizer. The Vectorizer is the module that's going to specify the model that will be used to turn our question text into vectors. So, here, we'll use text2vec-openai which is why we had to include the OpenAI API key at the beginning here as well. If you have yet to get that set up, you can look into the Read me file for this GitHub repository. The next thing that we're going to do is set up the vector index configuration. The index config. And here, we need to specify which distance we wanted to use. So, usually, if you're comparing text, we want to set it up so that it uses cosine distance. So, once this is done, we can go in and specify the properties that we want to be imported into our data set. So, in this particular data set we've got air date, round, value, category, question, and answer. We're not going to use all of these. And the question specifically tells us to keep the answer property, the question property, and the round property because that's what we're going to be using later on. The other stuff is not that important, so we're not going to bother bringing it into our vector database. So, the properties of interest to us here are we're interested in bringing in first property, which is going to be called. Here we're going to name this thing question and the data type for this is going to be the text field. The next thing that we're going to bring in is our answer. The data type for this. It's also going to be text. And the last field that we want here is the round. The data type here is also text. So, the schema here specifies what type of data we want to store into our database and what that data looks like. So, here these are all text fields. So, we just specify that. We're going to now use this class definition, and we're going to create the class in our schema. We're going to go ahead and run that, and that creates that class. Now, what you want to do is iterate over our data one by one, using this for loop and insert into this schema data point. So, here, all we need to do is go in and create individual objects. So, we'll go ahead and have an object body, and we create. And this is just going to be our dictionary. And we have to pass in each one of these properties that we've listed out. And we've told Weaver that we're going to give to it. So, here the first one is question. The next one here is answer that we've told it that we're going to give it. So, here, we're going to pass in the answer as well. And the last one we told it we were going to provide it was the round information, and that is located in this location. So, now that we have the object body, we just have to insert it into Weaviate. So, we can say batch add data object. And we're going to specify the data object here as object body. And we're going to tell it which class to insert this into. So, we'll say class name, and the class name here is question as we've specified up here. Looks pretty good. We're going to format this nicely and we'll run this. And this is going to take a little while because we're adding a thousand objects into our vector database. So, as a part of that process, it's going to vectorize all thousand objects one by one and insert them into the vector database. And that's why this cell took a longer while to run. So, the next thing that we want to make sure is that all of our data points are actually in the vector database. So, to do this, we want to make sure that the count of objects in the database is consistent. It needs to be 1000. So, here, we'll go ahead and print out how many data points we've got in there. So, we're going to go ahead and query the client. And this is going to be an aggregation of information in the question class. And this is going to be with meta account. And then, we want to perform this query like so. We're also going to go ahead and indent this by two so that we can read it nicely. So, notice how once we ask it to count, how many objects were there in the database and told us a thousand. And this is in line with our data set. So, this data set has 1000 objects as we saw here. We've successfully inserted a portion of them, the question answer and round information into the vector database. So, for question three, we want to search for objects that are close to the concept of spicy food recipes. And we want to show the four closest concepts here. So, again, we want to do a vector search here. So, we need to create our query. So, we'll go ahead and we'll say client dot query Get from the class. So, from the question class here, we want to extract out the question, and we want to extract out the answer with near text, like so, and then, we can pass in the concept object. And the concepts that we're interested in here are spicy food recipes. And because we only want four of the most relevant ones, we're going to go ahead and set a limit here with limit four. And we're going to tell it to perform this query. We're also going to go ahead and add the distance metrics. So, we want to know how far away these concepts are. So, we're going to go ahead and say with additional and we're going to pass it distance here. So, that it tells us how far away an object is from our query. Now, we're going to go ahead and run that query. And we get these concepts, so we get popular Pennsylvania pepper pot Mexican dishes, dishes garnished. We have Indian dishes, so we have none. So, these are all concepts of spicy dishes or food that's eaten with spicy dishes. And you can look at the distance to identify how similar or how semantically relevant, this object is to your query that you've passed in here. So, next up, we want to do something a little bit more tricky. What we want to do is search for spicy food recipe related questions that have been used in double jeopardy rounds. So, here what we want to do is known as filtering. And the reason we can do that is because we saved this round information. We have that information inserted into the database. So, we want to look for specific round values where it's double jeopardy, and we want to go in and search for spicy food recipes that also meet the criteria of being questions used in double jeopardy rounds. So, to set this up, what we're going to do is start off with our old query. So, we're going to take this query, and we're going to go ahead and start off here. Now, we need to modify this query. So, the first thing that we'll do is let's limit it to three options. Arbitrary you can modify it as well, and add a where filter to this with where. And the wear filter allows us to filter over metadata so that we can select out rounds that are double jeopardy rounds. And how we're going to specify this is by doing the following. We're going to give it a path to identify which property we're interested in filtering over. So, we're interested in filtering over the round property because we want double jeopardy rounds. So, that's where the information that we need to filter over is found. We're going to identify how we're going to compare. So we know what to keep and what to throw out. We want to keep the double jeopardy round here. So, we're going to make sure that we use the equal operator because we want rounds equal to double jeopardy rounds. And we're going to pass in a value text to specify exactly what rounds we want to keep. So, here, we'll pass in double jeopardy rounds. And this filter is going to make sure that anything that's returned is not only semantically relevant to our query, but also meets this strict criteria of being a double jeopardy round question. We can go ahead and run this. And now you can see that some of the concepts that were returned are found here because these are double jeopardy round questions. How do we know that they're double jeopardy round questions? We can go up here, and we can concatenate this, and we can say, okay, I also want to know the round information for this. So, if I re-run this query now, we get the exact same results. But now the round is double jeopardy. And with that, we've successfully answered the fourth question. This chapter introduced the concept of vector search and how you can use Weaviate, an open-source vector database to start performing vector search over your own data. In the next chapter, we'll see how machine learning models actually generate vectors and how we can practically specify these models for use in Weaviate. We'll even build an image and text search engine.
Contents
-
-
-
-
(Locked)
Frame the query as a question or search1m 56s
-
(Locked)
Generate the question in machine-understandable language1m 22s
-
Adding data to a vector database9m 48s
-
Performing semantic searches using Weaviate13m 36s
-
(Locked)
Challenge: Vector search with Weaviate49s
-
Solution: Vector Search with Weaviate11m 5s
-
(Locked)
-
-
-
-