From the course: Hands-On AI: Building LLM-Powered Apps
Embedding model limitations - Python Tutorial
From the course: Hands-On AI: Building LLM-Powered Apps
Embedding model limitations
- [Narrator] In the last video, we discussed the embedding search for retrieval part of rack architecture. In this video, we'll discuss some of the limitations and the workaround we have to do to make them functional. One of the biggest limitation of embedding model is that they also have limited input. The overall input length typically ranges from 384 tokens for open source models to up to around 8,000 tokens for open AI models. So what do we do? One of the easiest way is just truncate the documents after they pass the input limits. This makes sense for certain applications, but for a lot of other applications we don't want to lose information. So the common approach is then chunk the document into pieces. While we do that, we will try to keep track of the document chunks in the metadata so we know where they came from. Now, here's another challenge. When we say chunk the document, how do we chunk the long documents? We can sometimes chunk documents without any overlaps, or we can chunk document by line returns, by periods, by commas, et cetera, et cetera. One common approach is to chunk document with some overlaps so each chunk include part of the previous chunk and some part of the next chunk. This is similar to TV shows where they typically include recap of the previous episode in the beginning and the preview of the next episode towards the end. Another approach is we could create several search indices each with different chunk size and we can search across multiple. More importantly, when we chunk documents, we do need to budget the context length for large language models. As an example, when we retrieve top five relevant documents from search engine for a large language model and the model has 16,000 maximum length, we will need to budget so that the sum of the chunk size does not go beyond 16,000 token length, and we also need to leave some room for user's question and the model's answer. So in summary, embeddings is generated by embedding models. They have limited input length, so we need to chunk the documents to work around the limitation. In addition, because large language models have limited contact length as well, we need to budget our chunk with that in mind. Now, armed with this information, let's continue to build our chat with PDF application. In the next session, we will add the feature of loading and processing PDF files to our application.
Contents
-
-
-
-
Retrieval augmented generation3m 30s
-
Search engine basics2m 32s
-
Embedding search3m
-
Embedding model limitations3m 15s
-
Challenge: Enabling load PDF to Chainlit app48s
-
Solution: Enabling load PDF to Chainlit app5m 4s
-
Challenge: Indexing documents into a vector database1m 50s
-
Solution: Indexing documents into a vector database1m 43s
-
Challenge: Putting it all together1m 10s
-
Solution: Putting it all together3m 17s
-
Trying out your chat with the PDF app2m 15s
-
-
-