adporn.net Embedding model limitations - Python Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Hands-On AI: Building LLM-Powered Apps

Embedding model limitations - Python Tutorial

From the course: Hands-On AI: Building LLM-Powered Apps

Embedding model limitations

“

- [Narrator] In the last video, we discussed the embedding search for retrieval part of rack architecture. In this video, we'll discuss some of the limitations and the workaround we have to do to make them functional. One of the biggest limitation of embedding model is that they also have limited input. The overall input length typically ranges from 384 tokens for open source models to up to around 8,000 tokens for open AI models. So what do we do? One of the easiest way is just truncate the documents after they pass the input limits. This makes sense for certain applications, but for a lot of other applications we don't want to lose information. So the common approach is then chunk the document into pieces. While we do that, we will try to keep track of the document chunks in the metadata so we know where they came from. Now, here's another challenge. When we say chunk the document, how do we chunk the long documents? We can sometimes chunk documents without any overlaps, or we can chunk document by line returns, by periods, by commas, et cetera, et cetera. One common approach is to chunk document with some overlaps so each chunk include part of the previous chunk and some part of the next chunk. This is similar to TV shows where they typically include recap of the previous episode in the beginning and the preview of the next episode towards the end. Another approach is we could create several search indices each with different chunk size and we can search across multiple. More importantly, when we chunk documents, we do need to budget the context length for large language models. As an example, when we retrieve top five relevant documents from search engine for a large language model and the model has 16,000 maximum length, we will need to budget so that the sum of the chunk size does not go beyond 16,000 token length, and we also need to leave some room for user's question and the model's answer. So in summary, embeddings is generated by embedding models. They have limited input length, so we need to chunk the documents to work around the limitation. In addition, because large language models have limited contact length as well, we need to budget our chunk with that in mind. Now, armed with this information, let's continue to build our chat with PDF application. In the next session, we will add the feature of loading and processing PDF files to our application.

Contents