From the course: Hands-On AI: Building LLM-Powered Apps
Embedding search
- [Instructor] In the last video, we went through the retrieval part of the REC architecture. We also briefly discussed indexing and searching. Now let's dive deeper into search, specifically embeddings search. Embeddings search is currently the most popular way to search for an LLM application. What is embedding? Embeddings is a low dimensional representation of data. It is calculated using an encoder model where we input a chunk of text and the models output embedding. The output is an array of floating point numbers. Embeddings contains semantic information, including its meaning and the sentence structure. And one of the key feature of this is that sentences are similar, will be numerically close to each other. And this allows us to search by similarities. The way we measure how close the sentence are is by using something called cosine similarity. It is the angular distance between two embeddings. The embedding model learned to predict that similar sentences are closer to each other and different sentences are farther away. This is an extremely powerful concept. In the old days, we would need to search for exact match. As an example, when we search for the word USA, or United States of America, it used to require multiple different searches because those are spelled differently. But now with embedding search, USA, United States, and America will be all close to each other semantically so they will show up altogether. Now we capture the meanings and structures of the underlying sentences. We can then search by retrieving the top K, similar documents around the questions asked, measured in cosine similarity. This algorithm is called K-Nearest Neighbors, or KNN. And when there is a massive amount of data, we can use a more approximate model called approximate nearest neighbor search. And this improve the latency by sacrificing a little bit of relevancy. In summary, embeddings are low dimensional lossy representation that enables us to retrieve relevant documents using KNN or ANN. And the way we measure similarity is cosine similarity. Again, as usual, there are certain limitations to embedding models and we'll discuss them next.
Contents
-
-
-
-
Retrieval augmented generation3m 30s
-
Search engine basics2m 32s
-
Embedding search3m
-
Embedding model limitations3m 15s
-
Challenge: Enabling load PDF to Chainlit app48s
-
Solution: Enabling load PDF to Chainlit app5m 4s
-
Challenge: Indexing documents into a vector database1m 50s
-
Solution: Indexing documents into a vector database1m 43s
-
Challenge: Putting it all together1m 10s
-
Solution: Putting it all together3m 17s
-
Trying out your chat with the PDF app2m 15s
-
-
-