From the course: Introduction to AI-Native Vector Databases

Vector DB1: E-commerce RecSys

In the last chapter, we spoke about the scalability of a vector database and assessing its performance. In this chapter, we'll introduce three different and common uses of vector databases in industry, starting off with building a recommender system. So, let's think about what a recommender system is. So, if you look at my screen, you can think of this as an e-commerce platform where if a user comes in, and they're interested in buying a particular product, let's say we're here and we're interested in buying this shirt, for example. What you'd like to do is if a user shows interest in this by clicking it, or by putting this particular item in their cart, you want to reorganize all of the products that you have on your storefront so that similar items are shown further up, because now the user has shown you their intention of what they're interested in, and you can then recommend similar items to them. And so, in order to do this, you can use a vector database to build the underlying infrastructure for that. Let's have a look at how we can do that. The idea here is if we want to recommend an item, we need to know who's asking. So, if I am asking you to recommend to me, a blue shirt versus somebody else asking for a blue shirt, depending on my likes and my dislikes, the blue shirt that I'm interested in might be different. So, you have to establish who's asking to build a good recommender system. So, the way we do that is by encoding multiple classes. You need a user class, you need products and you need brands. And so, once you've got those three isolated classes, then what you can do is establish vectorized representations of each one of those. So, for example, for the product class, we can establish the vectorized representation for the product by encoding its image and creating a vector for the image so we can display similar products. Then, we can use this property of Weaviate, known as cross-references, which are shown here in the diagram as arrows. And so, if you create a user, that user can like a particular product, and that can be encoded in the database schema as a cross-reference from the user class to the product class. Similarly, the product can also have a brand so that exists as a cross-reference between the product class and the brand class. And the reason why we need these cross-references is because establishing the vectors for the individual classes will then be easy to do. So, using these cross-references or graph-references, we can actually capture user interest. So, let's say I'm a user here, and I have interacted with two items on your e-commerce storefront. I've bought a shirt, and I've bought a pair of boots. What we're going to do is represent a user's interest by drawing a graph-reference, or a cross-reference from the user class to the products that that user has interacted with. And so, the user and the product classes' interaction can be used to establish recommendations personalized to that user. These cross-references can be established as a result of a user liking an item, putting that item into their online cart, or even buying that item. And so, the way we use these cross-references is using a module called Ref2Vec, which allows us to turn a reference into a vector. So, for example, let's say you have a user over here and they like this shirt and this pair of boots. You can use these references, these arrows to establish a vector for that user, which is a function of the products that they like. And so, this user likes the shirt and the pair of boots, we can take the product and their vector representations, and then we can create an up to date vector representation of the user as an average of the shirt vector and an average of the boot vector. And so, effectively, what we're doing here is, in real-time, accounting for the fact that a user's preferences can change, and we're updating their vector by averaging all of the products that they like in real-time. And you can scale this up to millions of items, and you can do real-time recommendations as a result of this. So, the minute somebody logs into your e-commerce platform, they click a couple of items, you can take the average of the vector of those items and rerank everything on your storefront to be similar to the items that the user likes. Effectively, the products that the user chooses to interact with on your platform, then define the identity or define the vector of that user and that is used to perform recommendations to that user. And this is unique to every user that logs on to your platform. So, it's personalization at the user level. So, for this, what we're going to be using is the Ref2Vec e-commerce demo that's built on top of Weaviate and by Weaviate. So, we've created this, and effectively, what we have to do here is two things. We need to establish the references within our schema, which I'll show first of all over here. So, if we look at one of the code files where locally deploying Weaviate as we've done before, but now we're creating two classes. The first class we're going to create is the product class. And the product class uses a multi2Vec-clip module. And here, it only has an image field. So, effectively, you can tell from this that the vector for the product is going to be generated by looking at the image of that product. If we go down and if we look at the properties that the product has, it has the image that's going to be used to vectorize the product. It has a description, the text, the name of the product, it has where that product's images are stored. It has a location of the SKU number. We've got category for that product. We've got multiple pieces of information. But the main piece here is that a product's vector is established as a result of its image. The next thing if we scroll down in the schema definition is the definition of a user class. So, anytime somebody logs on to your platform, you create a new user for that person. And interestingly, how we establish the cross-reference here is notice how one of the properties for the user class is a liked item, and the type of this is a product. So, the type here is not a blob or a text string or anything like that. In fact, it is a instance of another class of type product over here. And so, here, effectively, when you do this, what you're telling Weaviate, the vector database is every time a user interacts with a product that is going to point to another class, and that's going to link it to that user's class, and this is how we generate those arrows that we just spoke about. The next thing that we're going to do is take this and tell the Weaviate module how to determine a vector for the user, which is through the ref2vec-centroid module. And the ref2vec-centroid module takes the references for that user, in this case, all of the products that that user likes, and it uses those to generate a vector for that user. And this is how we know what individual users like. The next thing we're going to do is go ahead and upload all of this data to Weaviate. So, we're going to upload the image of the product, we're going to upload the label name, the category, the description, all of that, including the image being encoded. And then, once that's all ready to go, we're going to go ahead and show you what that e-commerce platform looks like with this recommender system enabled now. So, here in my platform, I've organized all of my products and I can see that blue shirt that I'm interested in. There's two functionalities that I want to demonstrate here. First of all, I can search over products. So, this is the first feature that a vector database enables. So, I can, for example, say that I'm only interested in half-sleeve shirts. So, I can enable that, and I can perform that search. So, now I get these half-sleeved shirts that are organized to be at the top. Let's say I pick a product that I'm interested in, this blue shirt that I initially showed you on the storefront. I click that. As soon as I click that, it's now re-ranked all of the product offerings so that it shows me what's more similar to this shirt. So, if I'm interested in this blue shirt, potentially, I could be interested in this other blue shirt. And now, if you imagine if we had thousands of product offerings, as soon as a user shows you their intent of what product they're interested in, you can use a vector database to establish what type of shirt they're interested in by taking the vector of that shirt, and then reorganizing all of the other products to establish what other products have vectors close to that, and then reordering what they're seeing in real-time. And that increases the chances that they're going to find what they're interested in. That increases the conversion rate to buying that product. And this scales from just a few images in this toy example, all the way up to hundreds of thousands of unique product offerings. And that's the power of a vector database being used to power your recommender system on your e-commerce platform. And this is one of the main critical applications of vector databases that are used in industry currently. Here, we saw how you can make real time recommendations as customers are navigating an e-commerce platform. In the next video, we'll see a different flavor of search that vector databases enable called hybrid search.

Contents